BIDS Workflow#
This guide covers working with Brain Imaging Data Structure (BIDS) formatted datasets in eegprep. BIDS is a standardized format for organizing neuroimaging data, making it easier to share and process datasets consistently.
BIDS Dataset Structure#
A typical BIDS EEG dataset has the following structure:
dataset/
├── sub-01/
│ ├── ses-01/
│ │ └── eeg/
│ │ ├── sub-01_ses-01_task-rest_eeg.set
│ │ ├── sub-01_ses-01_task-rest_eeg.fdt
│ │ ├── sub-01_ses-01_task-rest_channels.tsv
│ │ ├── sub-01_ses-01_task-rest_eeg.json
│ │ └── sub-01_ses-01_task-rest_events.tsv
│ └── ses-02/
│ └── eeg/
│ └── ...
├── sub-02/
│ └── ses-01/
│ └── eeg/
│ └── ...
├── derivatives/
│ └── eegprep/
│ ├── sub-01/
│ │ └── ses-01/
│ │ └── eeg/
│ │ └── sub-01_ses-01_task-rest_eeg_preprocessed.set
│ └── sub-02/
│ └── ...
├── README
├── CHANGES
├── dataset_description.json
├── participants.tsv
└── participants.json
Key BIDS Files:
_eeg.set: EEGLAB format EEG data
_eeg.fdt: EEGLAB data file (binary)
_channels.tsv: Channel information (name, type, units)
_eeg.json: EEG metadata (sampling rate, reference, etc.)
_events.tsv: Event markers and timing
dataset_description.json: Dataset metadata
participants.tsv: Participant information
Loading BIDS Data#
Using pop_load_frombids#
Load a single file from a BIDS dataset:
from eegprep import pop_load_frombids
# Load a specific file
eeg = pop_load_frombids(
bids_root='data/bids_dataset',
subject='01',
session='01',
task='rest'
)
print(f"Loaded: {eeg.nbchan} channels, {eeg.pnts} samples")
print(f"Sampling rate: {eeg.srate} Hz")
Parameters:
bids_root: Path to the BIDS dataset root directorysubject: Subject ID (without ‘sub-’ prefix)session: Session ID (optional, without ‘ses-’ prefix)task: Task name (optional)run: Run number (optional)
Loading with Additional Parameters#
from eegprep import pop_load_frombids
# Load with specific run and additional options
eeg = pop_load_frombids(
bids_root='data/bids_dataset',
subject='01',
session='01',
task='oddball',
run='01',
preload=True # Load data into memory
)
Listing Available Files#
Find all EEG files in a BIDS dataset:
from eegprep import bids_list_eeg_files
# List all EEG files
files = bids_list_eeg_files('data/bids_dataset')
for file_info in files:
print(f"Subject: {file_info['subject']}")
print(f"Session: {file_info['session']}")
print(f"Task: {file_info['task']}")
print(f"File: {file_info['file']}")
print()
Running Batch Preprocessing#
Using bids_preproc#
Process all files in a BIDS dataset with a single command:
from eegprep import bids_preproc
# Run preprocessing on entire dataset
bids_preproc(
bids_root='data/bids_dataset',
output_dir='data/bids_dataset/derivatives/eegprep',
overwrite=False
)
Parameters:
bids_root: Path to BIDS dataset rootoutput_dir: Output directory for preprocessed dataoverwrite: Whether to overwrite existing filesn_jobs: Number of parallel jobs (default: 1)
Batch Processing with Custom Parameters#
from eegprep import bids_preproc
# Custom preprocessing parameters
bids_preproc(
bids_root='data/bids_dataset',
output_dir='data/bids_dataset/derivatives/eegprep',
preproc_params={
'flatline_criterion': 5,
'highpass': 1,
'lowpass': 100,
'asr_criterion': 20,
'ica': True,
'iclabel': True
},
n_jobs=4 # Use 4 parallel jobs
)
Parallel Processing#
Process multiple subjects in parallel:
from eegprep import bids_preproc
# Process with 8 parallel jobs
bids_preproc(
bids_root='data/bids_dataset',
output_dir='data/bids_dataset/derivatives/eegprep',
n_jobs=8,
verbose=True
)
Note: The number of jobs should not exceed the number of CPU cores available.
Processing Specific Subjects#
from eegprep import bids_preproc
# Process only specific subjects
bids_preproc(
bids_root='data/bids_dataset',
output_dir='data/bids_dataset/derivatives/eegprep',
subjects=['01', '02', '03']
)
Output Structure#
After running eegprep.bids_preproc(), the output is organized in the derivatives directory:
dataset/derivatives/eegprep/
├── sub-01/
│ ├── ses-01/
│ │ └── eeg/
│ │ ├── sub-01_ses-01_task-rest_eeg_preprocessed.set
│ │ ├── sub-01_ses-01_task-rest_eeg_preprocessed.fdt
│ │ ├── sub-01_ses-01_task-rest_channels.tsv
│ │ └── sub-01_ses-01_task-rest_eeg.json
│ └── ses-02/
│ └── eeg/
│ └── ...
├── sub-02/
│ └── ...
├── dataset_description.json
└── README
Derivatives Format#
The derivatives directory follows BIDS format with:
_preprocessed.set: Preprocessed EEG data
_preprocessed.fdt: Preprocessed data file
channels.tsv: Updated channel information
eeg.json: Updated metadata
dataset_description.json: Derivatives dataset description
Loading Preprocessed Data#
Load preprocessed data from derivatives:
from eegprep import pop_load_frombids
# Load preprocessed data
eeg = pop_load_frombids(
bids_root='data/bids_dataset/derivatives/eegprep',
subject='01',
session='01',
task='rest'
)
Integration with Other Tools#
Integration with MNE-Python#
Convert eegprep data to MNE format:
from eegprep import eeg_eeg2mne
import mne
# Convert to MNE Raw object
raw = eeg_eeg2mne(eeg)
# Now use MNE functions
raw.plot()
raw.compute_psd().plot()
Converting Back to eegprep#
from eegprep import eeg_mne2eeg
# Convert MNE Raw back to eegprep format
eeg = eeg_mne2eeg(raw)
Integration with EEGLAB#
Save preprocessed data in EEGLAB format:
from eegprep import pop_saveset
# Save as EEGLAB .set file
pop_saveset(eeg, 'preprocessed_data.set')
Load EEGLAB files:
from eegprep import pop_loadset
# Load EEGLAB .set file
eeg = pop_loadset('data.set')
Working with BIDS Metadata#
Accessing Channel Information#
from eegprep import pop_load_frombids
eeg = pop_load_frombids(
bids_root='data/bids_dataset',
subject='01',
session='01',
task='rest'
)
# Access channel information
for i, chan in enumerate(eeg.chanlocs):
print(f"Channel {i}: {chan['labels']}")
print(f" Type: {chan['type']}")
print(f" Location: ({chan['X']}, {chan['Y']}, {chan['Z']})")
Accessing Event Information#
# Access events
if hasattr(eeg, 'event'):
for event in eeg.event:
print(f"Event type: {event['type']}")
print(f"Latency: {event['latency']} samples")
print(f"Duration: {event['duration']} samples")
Accessing Metadata#
# Access BIDS metadata
if hasattr(eeg, 'etc') and 'bids' in eeg.etc:
bids_info = eeg.etc.bids
print(f"Task: {bids_info.get('task')}")
print(f"Sampling rate: {bids_info.get('srate')} Hz")
Common BIDS Workflows#
Complete Preprocessing Workflow#
from eegprep import (
pop_load_frombids,
clean_artifacts,
iclabel,
pop_saveset
)
# 1. Load data
eeg = pop_load_frombids(
bids_root='data/bids_dataset',
subject='01',
session='01',
task='rest'
)
# 2. Preprocess
eeg = clean_artifacts(
eeg,
highpass=1,
lowpass=100,
ica=True,
iclabel=True
)
# 3. Save to derivatives
pop_saveset(
eeg,
'data/bids_dataset/derivatives/eegprep/sub-01/ses-01/eeg/sub-01_ses-01_task-rest_eeg_preprocessed.set'
)
Batch Processing with Quality Control#
from eegprep import bids_preproc, bids_list_eeg_files
import json
# 1. List all files
files = bids_list_eeg_files('data/bids_dataset')
print(f"Found {len(files)} EEG files")
# 2. Run preprocessing
bids_preproc(
bids_root='data/bids_dataset',
output_dir='data/bids_dataset/derivatives/eegprep',
n_jobs=4
)
# 3. Create processing report
report = {
'total_files': len(files),
'preprocessing_date': '2024-01-01',
'parameters': {
'highpass': 1,
'lowpass': 100,
'ica': True
}
}
with open('preprocessing_report.json', 'w') as f:
json.dump(report, f, indent=2)
Troubleshooting BIDS Workflows#
File Not Found#
Problem: FileNotFoundError when loading BIDS data
Solution:
Verify BIDS dataset structure
Check subject and session IDs
Use
eegprep.bids_list_eeg_files()to find available files
from eegprep import bids_list_eeg_files
files = bids_list_eeg_files('data/bids_dataset')
for f in files:
print(f"sub-{f['subject']}_ses-{f['session']}_task-{f['task']}")
Invalid BIDS Format#
Problem: Data doesn’t conform to BIDS standard
Solution:
Validate BIDS dataset using the BIDS Validator
Check dataset_description.json
Verify file naming conventions
Parallel Processing Errors#
Problem: Errors when using n_jobs > 1
Solution:
Start with
n_jobs=1to identify the issueCheck for file locking issues
Ensure output directory is writable
Reduce
n_jobsif system resources are limited
Memory Issues#
Problem: Out of memory errors during batch processing
Solution:
Reduce
n_jobsto process fewer files in parallelProcess subjects in smaller batches
Increase available system RAM
Use a machine with more memory
Best Practices#
Validate BIDS format: Use the BIDS Validator before processing
Backup original data: Keep a copy of raw data before preprocessing
Document parameters: Record preprocessing parameters in a configuration file
Quality control: Visually inspect preprocessed data
Version control: Track eegprep version used for reproducibility
Parallel processing: Use
n_jobsto speed up batch processingMonitor progress: Use
verbose=Trueto track processing status
Next Steps#
Now that you understand BIDS workflows:
Read the Preprocessing Pipeline guide for detailed preprocessing steps
Explore the Configuration guide for parameter tuning
Check the Advanced Topics for custom pipelines
Review the API Reference for detailed function documentation