BIDS Workflow#

This guide covers working with Brain Imaging Data Structure (BIDS) formatted datasets in eegprep. BIDS is a standardized format for organizing neuroimaging data, making it easier to share and process datasets consistently.

BIDS Dataset Structure#

A typical BIDS EEG dataset has the following structure:

dataset/
├── sub-01/
│   ├── ses-01/
│   │   └── eeg/
│   │       ├── sub-01_ses-01_task-rest_eeg.set
│   │       ├── sub-01_ses-01_task-rest_eeg.fdt
│   │       ├── sub-01_ses-01_task-rest_channels.tsv
│   │       ├── sub-01_ses-01_task-rest_eeg.json
│   │       └── sub-01_ses-01_task-rest_events.tsv
│   └── ses-02/
│       └── eeg/
│           └── ...
├── sub-02/
│   └── ses-01/
│       └── eeg/
│           └── ...
├── derivatives/
│   └── eegprep/
│       ├── sub-01/
│       │   └── ses-01/
│       │       └── eeg/
│       │           └── sub-01_ses-01_task-rest_eeg_preprocessed.set
│       └── sub-02/
│           └── ...
├── README
├── CHANGES
├── dataset_description.json
├── participants.tsv
└── participants.json

Key BIDS Files:

_eeg.set: EEGLAB format EEG data
_eeg.fdt: EEGLAB data file (binary)
_channels.tsv: Channel information (name, type, units)
_eeg.json: EEG metadata (sampling rate, reference, etc.)
_events.tsv: Event markers and timing
dataset_description.json: Dataset metadata
participants.tsv: Participant information

Loading BIDS Data#

Using pop_load_frombids#

Load a single file from a BIDS dataset:

from eegprep import pop_load_frombids

# Load a specific file
eeg = pop_load_frombids(
    bids_root='data/bids_dataset',
    subject='01',
    session='01',
    task='rest'
)

print(f"Loaded: {eeg.nbchan} channels, {eeg.pnts} samples")
print(f"Sampling rate: {eeg.srate} Hz")

Parameters:

bids_root: Path to the BIDS dataset root directory
subject: Subject ID (without ‘sub-’ prefix)
session: Session ID (optional, without ‘ses-’ prefix)
task: Task name (optional)
run: Run number (optional)

Loading with Additional Parameters#

from eegprep import pop_load_frombids

# Load with specific run and additional options
eeg = pop_load_frombids(
    bids_root='data/bids_dataset',
    subject='01',
    session='01',
    task='oddball',
    run='01',
    preload=True  # Load data into memory
)

Listing Available Files#

Find all EEG files in a BIDS dataset:

from eegprep import bids_list_eeg_files

# List all EEG files
files = bids_list_eeg_files('data/bids_dataset')

for file_info in files:
    print(f"Subject: {file_info['subject']}")
    print(f"Session: {file_info['session']}")
    print(f"Task: {file_info['task']}")
    print(f"File: {file_info['file']}")
    print()

Running Batch Preprocessing#

Using bids_preproc#

Process all files in a BIDS dataset with a single command:

from eegprep import bids_preproc

# Run preprocessing on entire dataset
bids_preproc(
    bids_root='data/bids_dataset',
    output_dir='data/bids_dataset/derivatives/eegprep',
    overwrite=False
)

Parameters:

bids_root: Path to BIDS dataset root
output_dir: Output directory for preprocessed data
overwrite: Whether to overwrite existing files
n_jobs: Number of parallel jobs (default: 1)

Batch Processing with Custom Parameters#

from eegprep import bids_preproc

# Custom preprocessing parameters
bids_preproc(
    bids_root='data/bids_dataset',
    output_dir='data/bids_dataset/derivatives/eegprep',
    preproc_params={
        'flatline_criterion': 5,
        'highpass': 1,
        'lowpass': 100,
        'asr_criterion': 20,
        'ica': True,
        'iclabel': True
    },
    n_jobs=4  # Use 4 parallel jobs
)

Parallel Processing#

Process multiple subjects in parallel:

from eegprep import bids_preproc

# Process with 8 parallel jobs
bids_preproc(
    bids_root='data/bids_dataset',
    output_dir='data/bids_dataset/derivatives/eegprep',
    n_jobs=8,
    verbose=True
)

Note: The number of jobs should not exceed the number of CPU cores available.

Processing Specific Subjects#

from eegprep import bids_preproc

# Process only specific subjects
bids_preproc(
    bids_root='data/bids_dataset',
    output_dir='data/bids_dataset/derivatives/eegprep',
    subjects=['01', '02', '03']
)

Output Structure#

After running eegprep.bids_preproc(), the output is organized in the derivatives directory:

dataset/derivatives/eegprep/
├── sub-01/
│   ├── ses-01/
│   │   └── eeg/
│   │       ├── sub-01_ses-01_task-rest_eeg_preprocessed.set
│   │       ├── sub-01_ses-01_task-rest_eeg_preprocessed.fdt
│   │       ├── sub-01_ses-01_task-rest_channels.tsv
│   │       └── sub-01_ses-01_task-rest_eeg.json
│   └── ses-02/
│       └── eeg/
│           └── ...
├── sub-02/
│   └── ...
├── dataset_description.json
└── README

Derivatives Format#

The derivatives directory follows BIDS format with:

_preprocessed.set: Preprocessed EEG data
_preprocessed.fdt: Preprocessed data file
channels.tsv: Updated channel information
eeg.json: Updated metadata
dataset_description.json: Derivatives dataset description

Loading Preprocessed Data#

Load preprocessed data from derivatives:

from eegprep import pop_load_frombids

# Load preprocessed data
eeg = pop_load_frombids(
    bids_root='data/bids_dataset/derivatives/eegprep',
    subject='01',
    session='01',
    task='rest'
)

Integration with Other Tools#

Integration with MNE-Python#

Convert eegprep data to MNE format:

from eegprep import eeg_eeg2mne
import mne

# Convert to MNE Raw object
raw = eeg_eeg2mne(eeg)

# Now use MNE functions
raw.plot()
raw.compute_psd().plot()

Converting Back to eegprep#

from eegprep import eeg_mne2eeg

# Convert MNE Raw back to eegprep format
eeg = eeg_mne2eeg(raw)

Integration with EEGLAB#

Save preprocessed data in EEGLAB format:

from eegprep import pop_saveset

# Save as EEGLAB .set file
pop_saveset(eeg, 'preprocessed_data.set')

Load EEGLAB files:

from eegprep import pop_loadset

# Load EEGLAB .set file
eeg = pop_loadset('data.set')

Working with BIDS Metadata#

Accessing Channel Information#

from eegprep import pop_load_frombids

eeg = pop_load_frombids(
    bids_root='data/bids_dataset',
    subject='01',
    session='01',
    task='rest'
)

# Access channel information
for i, chan in enumerate(eeg.chanlocs):
    print(f"Channel {i}: {chan['labels']}")
    print(f"  Type: {chan['type']}")
    print(f"  Location: ({chan['X']}, {chan['Y']}, {chan['Z']})")

Accessing Event Information#

# Access events
if hasattr(eeg, 'event'):
    for event in eeg.event:
        print(f"Event type: {event['type']}")
        print(f"Latency: {event['latency']} samples")
        print(f"Duration: {event['duration']} samples")

Accessing Metadata#

# Access BIDS metadata
if hasattr(eeg, 'etc') and 'bids' in eeg.etc:
    bids_info = eeg.etc.bids
    print(f"Task: {bids_info.get('task')}")
    print(f"Sampling rate: {bids_info.get('srate')} Hz")

Common BIDS Workflows#

Complete Preprocessing Workflow#

from eegprep import (
    pop_load_frombids,
    clean_artifacts,
    iclabel,
    pop_saveset
)

# 1. Load data
eeg = pop_load_frombids(
    bids_root='data/bids_dataset',
    subject='01',
    session='01',
    task='rest'
)

# 2. Preprocess
eeg = clean_artifacts(
    eeg,
    highpass=1,
    lowpass=100,
    ica=True,
    iclabel=True
)

# 3. Save to derivatives
pop_saveset(
    eeg,
    'data/bids_dataset/derivatives/eegprep/sub-01/ses-01/eeg/sub-01_ses-01_task-rest_eeg_preprocessed.set'
)

Batch Processing with Quality Control#

from eegprep import bids_preproc, bids_list_eeg_files
import json

# 1. List all files
files = bids_list_eeg_files('data/bids_dataset')
print(f"Found {len(files)} EEG files")

# 2. Run preprocessing
bids_preproc(
    bids_root='data/bids_dataset',
    output_dir='data/bids_dataset/derivatives/eegprep',
    n_jobs=4
)

# 3. Create processing report
report = {
    'total_files': len(files),
    'preprocessing_date': '2024-01-01',
    'parameters': {
        'highpass': 1,
        'lowpass': 100,
        'ica': True
    }
}

with open('preprocessing_report.json', 'w') as f:
    json.dump(report, f, indent=2)

Troubleshooting BIDS Workflows#

File Not Found#

Problem: FileNotFoundError when loading BIDS data

Solution:

Verify BIDS dataset structure
Check subject and session IDs
Use eegprep.bids_list_eeg_files() to find available files

from eegprep import bids_list_eeg_files

files = bids_list_eeg_files('data/bids_dataset')
for f in files:
    print(f"sub-{f['subject']}_ses-{f['session']}_task-{f['task']}")

Invalid BIDS Format#

Problem: Data doesn’t conform to BIDS standard

Solution:

Validate BIDS dataset using the BIDS Validator
Check dataset_description.json
Verify file naming conventions

Parallel Processing Errors#

Problem: Errors when using n_jobs > 1

Solution:

Start with n_jobs=1 to identify the issue
Check for file locking issues
Ensure output directory is writable
Reduce n_jobs if system resources are limited

Memory Issues#

Problem: Out of memory errors during batch processing

Solution:

Reduce n_jobs to process fewer files in parallel
Process subjects in smaller batches
Increase available system RAM
Use a machine with more memory

Best Practices#

Validate BIDS format: Use the BIDS Validator before processing
Backup original data: Keep a copy of raw data before preprocessing
Document parameters: Record preprocessing parameters in a configuration file
Quality control: Visually inspect preprocessed data
Version control: Track eegprep version used for reproducibility
Parallel processing: Use n_jobs to speed up batch processing
Monitor progress: Use verbose=True to track processing status

Next Steps#

Now that you understand BIDS workflows:

Read the Preprocessing Pipeline guide for detailed preprocessing steps
Explore the Configuration guide for parameter tuning
Check the Advanced Topics for custom pipelines
Review the API Reference for detailed function documentation