Core Functions and Classes#
This section documents the core functions and classes that form the foundation of eegprep.
Main Pipeline#
- eegprep.bids_preproc(root, *, ApplyChanlocs=None, ApplyEvents=None, ApplyMetadata=None, EventColumn=None, Subjects=None, Sessions=None, Runs=None, Tasks=None, SkipIfPresent=True, NumJobs=None, ReservePerJob='', UseHashes=False, ReturnData=False, OutputDir=None, SamplingRate=None, OnlyChannelsWithPosition=True, OnlyModalities=(), WithInterp=False, WithPicard=False, WithICLabel=False, WithReport=True, CommonAverageReference=True, ChannelCriterion=0.8, LineNoiseCriterion=4.0, BurstCriterion=5.0, WindowCriterion=0.25, Highpass=(0.25, 0.75), ChannelCriterionMaxBadTime=0.5, BurstCriterionRefMaxBadChns=0.075, BurstCriterionRefTolerances=(-inf, 5.5), BurstRejection='off', WindowCriterionTolerances=(-inf, 7), FlatlineCriterion=5.0, NumSamples=50, NoLocsChannelCriterion=0.45, NoLocsChannelCriterionExcluded=0.1, MaxMem=64, Distance='euclidian', Channels=None, Channels_ignore=None, availableRAM_GB=None, EpochEvents=None, EpochLimits=(-1, 2), EpochBaseline=None, StageNames=('desc-cleaned', 'desc-picard', 'desc-iclabel', 'desc-epoch'), MinimizeDiskUsage=True, bidschanloc=None, bidsevent=None, bidsmetadata=None, eventtype=None, subjects=None, sessions=None, runs=None, tasks=None, outputdir=None, _lock=<contextlib.nullcontext object>, _n_skipped=None, _k=0, _n_total=1, _n_jobs=1, _t0=1764115012.876383)#
Apply data cleaning to EEG files in a BIDS dataset.
- Return type:
- Parameters:
root (str)
ApplyChanlocs (bool | None)
ApplyEvents (bool | None)
ApplyMetadata (bool | None)
EventColumn (str | None)
SkipIfPresent (bool)
NumJobs (int | None)
ReservePerJob (str)
UseHashes (bool)
ReturnData (bool)
OutputDir (str | None)
SamplingRate (float | None)
OnlyChannelsWithPosition (bool)
WithInterp (bool)
WithPicard (bool)
WithICLabel (bool)
WithReport (bool)
CommonAverageReference (bool)
ChannelCriterionMaxBadTime (float)
BurstRejection (str)
NumSamples (int)
NoLocsChannelCriterion (float)
NoLocsChannelCriterionExcluded (float)
MaxMem (int)
Distance (str)
availableRAM_GB (float | None)
MinimizeDiskUsage (bool)
bidschanloc (bool | None)
bidsevent (bool | None)
bidsmetadata (bool | None)
eventtype (str | None)
outputdir (str | None)
_lock (Lock | None)
_n_skipped (Value | None)
_k (int)
_n_total (int)
_n_jobs (int)
_t0 (float)
Parameters#
- rootstr
The root directory containing BIDS data or a single EEG file path.
(BIDS import stage parameters)
- ApplyMetadatabool
Whether to apply metadata from BIDS sidecar files when loading raw EEG data. (default True)
- ApplyEventsbool
Whether to apply events from BIDS sidecar files when loading raw EEG data. (default False)
- ApplyChanlocsbool
Whether to apply channel locations from BIDS sidecar files when loading raw EEG data. (default True)
- EventColumnstr
Optionally the column name in the BIDS events file to use for event types; if not set, will be inferred heuristically.
- SubjectsSequence[str | int], optional
A sequence of subject identifiers or (zero-based) indices to filter the files by. If empty, all subjects are included.
- SessionsSequence[str | int], optional
A sequence of session identifiers or (zero-based) indices to filter the files by. If empty, all sessions are included.
- RunsSequence[str | int], optional
A sequence of run numbers or identifiers to filter the files by. If empty, all runs are included. Note that zero-based indexing does not apply to runs, unlike subjects and sessions since runs are already integers.
- TasksSequence[str] | str, optional
A sequence of task names or single task to filter the files by. If empty, all tasks are included (default is an empty sequence).
- OutputDirstr
The name of the subdirectory where cleaned files will be saved. This can start with the placeholder ‘{root}’ which will be replaced with the root path of the BIDS dataset. Defaults to ‘{root}/derivatives/eegprep’ if not specified. (overall run configuration)
- SkipIfPresentbool
skip processing files that already have a cleaned version present.
- NumJobsint, optional
The number of jobs to run in parallel. If set to -1, this will default to the number of logical cores on the system. If the ReservePerJob clause is also specified, this will be treated as a maximum, otherwise as the total. If neither of the two parameters is specified, a single job will run. Note: as usual when running multiple processes in Python, you need to use the if __name__ == “__main__”: guard pattern in your main processing script.
- ReservePerJobstr
Optionally the resource amount and type to reserve per job, e.g. ‘4GB’ or ‘2CPU’; the run will then use as many jobs as fit within the system resources of the specified type. * You can also specify how much of a margin of the total system resources should be withheld for use by other programs on the computer, by following the amount by a : and then the margin, as in ‘4GB:10GB’ (always leave 10GB unused), ‘2CPU:10%’ (always leave 10% of the total installed RAM unused). This also works with other metrics. * one may also specify a total or maximum number of jobs, as in ‘10total’ or ‘10max’. * Multiple criteria can be spefied in a comma-separated list of reservations, e.g. ‘4GB:20%, 2CPU, 5max’. * If neither this nor NumJobs are specified, a single job will run. Note that the system will also run in serial when in debug mode and when on a platform that does not cleanly support multiprocessing. Tip: a good way to size this is to perform a serial run and to monitor how much peak RAM a single job takes, and then setting this to <PeakUsage>GB:<YourMargin>GB where YourMargin is however much you want to leave to other programs, e.g., 5GB (this will depend on what else you expect to be running on the machine).
- UseHashesbool
Whether to bake hashes into intermediate file names; if you experiment with alternative preprocessing settings, it is recommended to enable this or disable the SkipIfPresent option since otherwise the routine may pick up a stale result.
- ReturnDatabool
Whether to return the final EEG data objects as a list. Note that this can use quite a lot of memory for large studies and it may be better to iterate over the preprocessed files in downstream analyses.
(overall processing parameters) OnlyChannelsWithPosition : bool
Whether to retain only channels for which positions were recorded or could be inferred. If this is not set, then OnlyModalities should be set so as to retain only modalities that should be preprocessed together.
- OnlyModalitiesSequence[str], optional
If set, retain only channels that have the associated modalities. If enabled, this is typically set to [‘EEG’] but may also include other ExG modalities such as EOG or EMG that have the same unit and scale as EEG. If non-electrophysiological modalities are included, some artifact removal steps may not function correctly.
- SamplingRatefloat
Desired sampling rate for the preprocessed data. If not specified, will retain the original sampling rate.
- WithInterpbool
Whether to reinterpolate dropped channels, thus retaining the same channel count as the raw data.
- WithPicardbool
Whether to apply PICARD ICA decomposition after cleaning.
- WithICLabelbool
Whether to apply ICLabel classification after ICA. Normally requires WithPicard=True.
- CommonAverageReferencebool
Whether to transform the EEG data to a common average referencing scheme; recommended for cross-study processing.
(parameters for artifact removal - same as in clean_artifacts function)
- ChannelCriterionfloat or ‘off’
Minimum channel correlation threshold for channel cleaning; channels below this value are considered bad. Pass ‘off’ to skip channel criterion. Default 0.8.
- LineNoiseCriterionfloat or ‘off’
Z-score threshold for line-noise contamination; channels exceeding this are considered bad. ‘off’ disables line-noise check. Default 4.0.
- BurstCriterionfloat or ‘off’
ASR standard-deviation cutoff for high-amplitude bursts; values above this relative to calibration data are repaired (or removed if BurstRejection=’on’). ‘off’ skips ASR. Default 5.0.
- WindowCriterionfloat or ‘off’
Fraction (0-1) or count of channels allowed to be bad per window; windows with more bad channels are removed. ‘off’ disables final window removal. Default 0.25.
- Highpasstuple(float, float) or ‘off’
Transition band [low, high] in Hz for initial high-pass filtering. ‘off’ skips drift removal. Default (0.25, 0.75).
- ChannelCriterionMaxBadTimefloat
Maximum tolerated time (seconds or fraction of recording) a channel may be flagged bad before being removed. Default 0.5.
- BurstCriterionRefMaxBadChnsfloat or ‘off’
Maximum fraction of bad channels tolerated when selecting calibration data for ASR. ‘off’ uses all data for calibration. Default 0.075.
- BurstCriterionRefTolerancestuple(float, float) or ‘off’
Power Z-score tolerances for selecting calibration windows in ASR. ‘off’ uses all data. Default (-inf, 5.5).
- BurstRejectionstr
‘on’ to reject (drop) burst segments instead of reconstructing with ASR, ‘off’ to apply ASR repair. Default ‘off’.
- WindowCriterionTolerancestuple(float, float) or ‘off’
Power Z-score bounds for final window removal. ‘off’ disables this stage. Default (-inf, 7).
- FlatlineCriterionfloat or ‘off’
Maximum flatline duration in seconds; channels exceeding this are removed. ‘off’ disables flatline removal. Default 5.0.
- NumSamplesint
Number of RANSAC samples for channel cleaning. Default 50.
- NoLocsChannelCriterionfloat
Correlation threshold for fallback channel cleaning when no channel locations. Default 0.45.
- NoLocsChannelCriterionExcludedfloat
Fraction of channels excluded when assessing correlation in nolocs cleaning. Default 0.1.
- MaxMemint
Maximum memory in MB for ASR processing. Default 64.
- Distancestr
Distance metric for ASR processing (‘euclidian’). Default ‘euclidian’.
- ChannelsSequence[str] or None
List of channel labels to include before cleaning (pop_select). Default None.
- Channels_ignoreSequence[str] or None
List of channel labels to exclude before cleaning. Default None.
- availableRAM_GBfloat or None
Available system RAM in GB to adjust MaxMem. Default None.
(parameters for an optional epoching and baseline removal step)
- EpochEventsstr or Sequence[str] or None
Optionally a list of event types or regular expression matching event types at which to time-lock epochs. If None (default), no epoching is done. If [], will time-lock to every event in the data (warning, this can amplify the data if epochs overlap!)
- EpochLimitsSequence[float]
The time limits in seconds relative to the event markers for epoching. Default (-1, 2).
- EpochBaselineSequence[float] or None
Optionally a time range in seconds relative to the event markers for baseline correction. If None (default), no baseline correction is applied. The special value None can be used to refer to the respective end of the epoch limits, as in (None, 0).
(misc parameters)
- StageNamesSequence[str]
list of file name parts for the preprocessing stages, in the order of cleaning,ica,iclabel; these can be adjusted when working with different preprocessed versions (e.g., using different parameters for cleaning). It is recommended that these start with ‘desc-‘.
- MinimizeDiskUsagebool
whether to minimize disk usage by not saving some intermediate files (specifically the PICARD output if WithICLabel=False). Default True.
(parameters retained for backwards compatibility with EEGLAB’s pop_importbids call signature)
- bidsmetadatabool
alias for ApplyMetadata
- bidseventbool
alias for ApplyEvents
- bidschanlocbool
alias for ApplyChanlocs
- eventtypestr
alias for EventColumn
- subjectsSequence[str | int], optional
alias for Subjects
- sessionsSequence[str | int], optional
alias for Sessions
- runsSequence[str | int], optional
alias for RUns
- tasksSequence[str] | str, optional
alias for Tasks
- outputdirstr
alias for OutputDir
Returns#
- resultDict[str,Any] | List[Dict[str, Any]] | None
Depending on ReturnData, either a list of EEG objects (if BIDS root folder was specified) or a single EEG object (if a single file was specified), otherwise None.
- eegprep.bids_list_eeg_files(root, subjects=(), sessions=(), runs=(), tasks=())#
Return a list of all EEG raw-data files in a BIDS dataset.
- Return type:
- Parameters:
Parameters#
- rootstr
The root directory containing BIDS data.
- subjectsSequence[str | int], optional
A sequence of subject identifiers or (zero-based) indices to filter the files by. If empty, all subjects are included.
- sessionsSequence[str | int], optional
A sequence of session identifiers or (zero-based) indices to filter the files by. If empty, all sessions are included.
- runsSequence[str | int], optional
A sequence of run numbers or identifiers to filter the files by. If empty, all runs are included. Note that zero-based indexing does not apply to runs, unlike subjects and sessions since runs are already integers.
- tasksSequence[str] | str, optional
A sequence of task names or single task to filter the files by. If empty, all tasks are included (default is an empty sequence).
Returns#
- List[str]
A list of file paths to EEG files in the BIDS dataset.
Data Validation#
- eegprep.eeg_checkset(EEG, load_data=True)#
Validate and set up EEG dataset structure.
Ensures EEG dict has required fields with correct types, computes ICA activations if possible, and loads data from file if specified.
Object-Oriented Interface#
- class eegprep.EEGobj(EEG_or_path)#
Bases:
objectWrapper class for EEG datasets stored as dictionaries.
Provides attribute access to EEG fields and method calls to eegprep functions.
- __init__(EEG_or_path)#
Initialize from an EEG dict or a file path string.
If string: loads dataset with pop_loadset(path).
If dict: uses it directly.
- __getattr__(name)#
Access EEG fields or eegprep functions.
If ‘name’ is a key in EEG, return EEG[name] (convenience).
If ‘name’ is a function in eegprep, return a wrapper that: self.EEG = func(deepcopy(self.EEG), …) and returns updated EEG for convenience.
- __setattr__(name, value)#
Set attributes on the underlying EEG dict when possible, else on the wrapper.
- __repr__()#
Multi-line, MNE-like summary of the EEG object.
Shows key metadata, data shape, sampling info, time span, and brief events/channels info.
- __str__()#
Multi-line, MNE-like summary of the EEG object.
Shows key metadata, data shape, sampling info, time span, and brief events/channels info.