Core Functions and Classes#
This section documents the core functions and classes that form the foundation of eegprep.
Main Pipeline#
- eegprep.bids_preproc(root, *, ApplyChanlocs=None, ApplyEvents=None, ApplyMetadata=None, EventColumn=None, Subjects=None, Sessions=None, Runs=None, Tasks=None, SkipIfPresent=True, NumJobs=None, ReservePerJob='', UseHashes=False, ReturnData=False, OutputDir=None, SamplingRate=None, OnlyChannelsWithPosition=True, OnlyModalities=(), WithInterp=False, WithICA=False, WithPicard=False, ICAAlgorithm='runica', AmicaArgs=None, WithICLabel=False, WithReport=True, CommonAverageReference=True, ChannelCriterion=0.8, LineNoiseCriterion=4.0, BurstCriterion=5.0, WindowCriterion=0.25, Highpass=(0.25, 0.75), ChannelCriterionMaxBadTime=0.5, BurstCriterionRefMaxBadChns=0.075, BurstCriterionRefTolerances=(-inf, 5.5), BurstRejection='off', WindowCriterionTolerances=(-inf, 7), FlatlineCriterion=5.0, NumSamples=50, NoLocsChannelCriterion=0.45, NoLocsChannelCriterionExcluded=0.1, MaxMem=64, Distance='euclidian', Channels=None, Channels_ignore=None, availableRAM_GB=None, EpochEvents=None, EpochLimits=(-1, 2), EpochBaseline=None, StageNames=('desc-cleaned', 'desc-picard', 'desc-iclabel', 'desc-epoch'), FinalDesc=None, ReportDir=None, MinimizeDiskUsage=True, SaveIntermediateStages=False, IntermediateDir=None, bidschanloc=None, bidsevent=None, bidsmetadata=None, eventtype=None, subjects=None, sessions=None, runs=None, tasks=None, outputdir=None, _lock=<contextlib.nullcontext object>, _n_skipped=None, _k=0, _n_total=1, _n_jobs=1, _t0=1781298326.2115788)
Apply data cleaning to EEG files in a BIDS dataset.
Parameters#
- rootstr
The root directory containing BIDS data or a single EEG file path.
- ApplyMetadatabool
Whether to apply metadata from BIDS sidecar files when loading raw EEG data. (default True)
- ApplyEventsbool
Whether to apply events from BIDS sidecar files when loading raw EEG data. (default False)
- ApplyChanlocsbool
Whether to apply channel locations from BIDS sidecar files when loading raw EEG data. (default True)
- EventColumnstr
Optionally the column name in the BIDS events file to use for event types; if not set, will be inferred heuristically.
- SubjectsSequence[str | int], optional
A sequence of subject identifiers or (zero-based) indices to filter the files by. If empty, all subjects are included.
- SessionsSequence[str | int], optional
A sequence of session identifiers or (zero-based) indices to filter the files by. If empty, all sessions are included.
- RunsSequence[str | int], optional
A sequence of run numbers or identifiers to filter the files by. If empty, all runs are included. Note that zero-based indexing does not apply to runs, unlike subjects and sessions since runs are already integers.
- TasksSequence[str] | str, optional
A sequence of task names or single task to filter the files by. If empty, all tasks are included (default is an empty sequence).
- OutputDirstr
The name of the subdirectory where cleaned files will be saved. This can start with the placeholder ‘{root}’ which will be replaced with the root path of the BIDS dataset. Defaults to ‘{root}/derivatives/eegprep’ if not specified.
- SkipIfPresentbool
skip processing files that already have a cleaned version present.
- NumJobsint, optional
The number of jobs to run in parallel. If set to -1, this will default to the number of logical cores on the system. If the ReservePerJob clause is also specified, this will be treated as a maximum, otherwise as the total. If neither of the two parameters is specified, a single job will run. Note: as usual when running multiple processes in Python, you need to use the if __name__ == “__main__”: guard pattern in your main processing script.
- ReservePerJobstr
Optionally the resource amount and type to reserve per job, e.g. ‘4GB’ or ‘2CPU’; the run will then use as many jobs as fit within the system resources of the specified type. You can add a margin after a colon, as in ‘4GB:10GB’ or ‘2CPU:10%’. You can also specify a total or maximum number of jobs, such as ‘10total’ or ‘10max’. Multiple criteria can be provided as a comma-separated list, for example ‘4GB:20%, 2CPU, 5max’. If neither ReservePerJob nor NumJobs is specified, a single job will run. The system also runs serially in debug mode and on platforms that do not cleanly support multiprocessing. Tip: a good way to size this is to perform a serial run and to monitor how much peak RAM a single job takes, and then setting this to <PeakUsage>GB:<YourMargin>GB where YourMargin is however much you want to leave to other programs, e.g., 5GB (this will depend on what else you expect to be running on the machine).
- UseHashesbool
Whether to bake hashes into intermediate file names; if you experiment with alternative preprocessing settings, it is recommended to enable this or disable the SkipIfPresent option since otherwise the routine may pick up a stale result.
- ReturnDatabool
Whether to return the final EEG data objects as a list. Note that this can use quite a lot of memory for large studies and it may be better to iterate over the preprocessed files in downstream analyses.
- OnlyChannelsWithPositionbool
Whether to retain only channels for which positions were recorded or could be inferred. If this is not set, then OnlyModalities should be set so as to retain only modalities that should be preprocessed together.
- OnlyModalitiesSequence[str], optional
If set, retain only channels that have the associated modalities. If enabled, this is typically set to [‘EEG’] but may also include other ExG modalities such as EOG or EMG that have the same unit and scale as EEG. If non-electrophysiological modalities are included, some artifact removal steps may not function correctly.
- SamplingRatefloat
Desired sampling rate for the preprocessed data. If not specified, will retain the original sampling rate.
- WithInterpbool
Whether to reinterpolate dropped channels, thus retaining the same channel count as the raw data.
- WithICAbool
Whether to apply PICARD ICA decomposition after cleaning.
- AmicaArgsdict or None
Additional keyword arguments for AMICA when ICAAlgorithm=’amica’, e.g. {‘num_models’: 2, ‘max_iter’: 500}.
- WithICLabelbool
Whether to apply ICLabel classification after ICA. Normally requires WithICA=True.
- CommonAverageReferencebool
Whether to transform the EEG data to a common average referencing scheme; recommended for cross-study processing.
- ChannelCriterionfloat or ‘off’
Minimum channel correlation threshold for channel cleaning; channels below this value are considered bad. Pass ‘off’ to skip channel criterion. Default 0.8.
- LineNoiseCriterionfloat or ‘off’
Z-score threshold for line-noise contamination; channels exceeding this are considered bad. ‘off’ disables line-noise check. Default 4.0.
- BurstCriterionfloat or ‘off’
ASR standard-deviation cutoff for high-amplitude bursts; values above this relative to calibration data are repaired (or removed if BurstRejection=’on’). ‘off’ skips ASR. Default 5.0.
- WindowCriterionfloat or ‘off’
Fraction (0-1) or count of channels allowed to be bad per window; windows with more bad channels are removed. ‘off’ disables final window removal. Default 0.25.
- Highpasstuple(float, float) or ‘off’
Transition band [low, high] in Hz for initial high-pass filtering. ‘off’ skips drift removal. Default (0.25, 0.75).
- ChannelCriterionMaxBadTimefloat
Maximum tolerated time (seconds or fraction of recording) a channel may be flagged bad before being removed. Default 0.5.
- BurstCriterionRefMaxBadChnsfloat or ‘off’
Maximum fraction of bad channels tolerated when selecting calibration data for ASR. ‘off’ uses all data for calibration. Default 0.075.
- BurstCriterionRefTolerancestuple(float, float) or ‘off’
Power Z-score tolerances for selecting calibration windows in ASR. ‘off’ uses all data. Default (-inf, 5.5).
- BurstRejectionstr
‘on’ to reject (drop) burst segments instead of reconstructing with ASR, ‘off’ to apply ASR repair. Default ‘off’.
- WindowCriterionTolerancestuple(float, float) or ‘off’
Power Z-score bounds for final window removal. ‘off’ disables this stage. Default (-inf, 7).
- FlatlineCriterionfloat or ‘off’
Maximum flatline duration in seconds; channels exceeding this are removed. ‘off’ disables flatline removal. Default 5.0.
- NumSamplesint
Number of RANSAC samples for channel cleaning. Default 50.
- NoLocsChannelCriterionfloat
Correlation threshold for fallback channel cleaning when no channel locations. Default 0.45.
- NoLocsChannelCriterionExcludedfloat
Fraction of channels excluded when assessing correlation in nolocs cleaning. Default 0.1.
- MaxMemint
Maximum memory in MB for ASR processing. Default 64.
- Distancestr
Distance metric for ASR processing (‘euclidian’). Default ‘euclidian’.
- ChannelsSequence[str] or None
List of channel labels to include before cleaning (pop_select). Default None.
- Channels_ignoreSequence[str] or None
List of channel labels to exclude before cleaning. Default None.
- availableRAM_GBfloat or None
Available system RAM in GB to adjust MaxMem. Default None.
- EpochEventsstr or Sequence[str] or None
Optionally a list of event types or regular expression matching event types at which to time-lock epochs. If None (default), no epoching is done. If [], will time-lock to every event in the data (warning, this can amplify the data if epochs overlap!)
- EpochLimitsSequence[float]
The time limits in seconds relative to the event markers for epoching. Default (-1, 2).
- EpochBaselineSequence[float] or None
Optionally a time range in seconds relative to the event markers for baseline correction. If None (default), no baseline correction is applied. The special value None can be used to refer to the respective end of the epoch limits, as in (None, 0).
- StageNamesSequence[str]
list of file name parts for the preprocessing stages, in the order of cleaning,ica,iclabel; these can be adjusted when working with different preprocessed versions (e.g., using different parameters for cleaning). It is recommended that these start with ‘desc-‘.
- FinalDescstr or None
Optional desc- label for the final output file. If None (default), uses the last stage name from StageNames. If empty string ‘’, the output file has no desc- label (e.g., sub-01_task-rest_eeg.set instead of sub-01_task-rest_desc-cleaned_eeg.set).
- ReportDirstr or None
Optional directory for report JSON files. If None (default), reports are saved alongside the data files. If set (e.g., ‘code/reports’), reports are saved there relative to the output directory.
- MinimizeDiskUsagebool
whether to minimize disk usage by not saving some intermediate files (specifically the PICARD output if WithICLabel=False). Default True.
- bidsmetadatabool
alias for ApplyMetadata
- bidseventbool
alias for ApplyEvents
- bidschanlocbool
alias for ApplyChanlocs
- eventtypestr
alias for EventColumn
- subjectsSequence[str | int], optional
alias for Subjects
- sessionsSequence[str | int], optional
alias for Sessions
- runsSequence[str | int], optional
alias for RUns
- tasksSequence[str] | str, optional
alias for Tasks
- outputdirstr
alias for OutputDir
Returns#
- resultDict[str,Any] | List[Dict[str, Any]] | None
Depending on ReturnData, either a list of EEG objects (if BIDS root folder was specified) or a single EEG object (if a single file was specified), otherwise None.
- Parameters:
root (str)
ApplyChanlocs (bool | None)
ApplyEvents (bool | None)
ApplyMetadata (bool | None)
EventColumn (str | None)
SkipIfPresent (bool)
NumJobs (int | None)
ReservePerJob (str)
UseHashes (bool)
ReturnData (bool)
OutputDir (str | None)
SamplingRate (float | None)
OnlyChannelsWithPosition (bool)
WithInterp (bool)
WithICA (bool)
WithPicard (bool)
ICAAlgorithm (str)
AmicaArgs (dict | None)
WithICLabel (bool)
WithReport (bool)
CommonAverageReference (bool)
ChannelCriterionMaxBadTime (float)
BurstRejection (str)
NumSamples (int)
NoLocsChannelCriterion (float)
NoLocsChannelCriterionExcluded (float)
MaxMem (int)
Distance (str)
availableRAM_GB (float | None)
FinalDesc (str | None)
ReportDir (str | None)
MinimizeDiskUsage (bool)
SaveIntermediateStages (bool)
IntermediateDir (str | None)
bidschanloc (bool | None)
bidsevent (bool | None)
bidsmetadata (bool | None)
eventtype (str | None)
outputdir (str | None)
_lock (Lock | None)
_n_skipped (Value | None)
_k (int)
_n_total (int)
_n_jobs (int)
_t0 (float)
- Return type:
- eegprep.bids_list_eeg_files(root, subjects=(), sessions=(), runs=(), tasks=())
Return a list of all EEG raw-data files in a BIDS dataset.
Parameters#
- rootstr
The root directory containing BIDS data.
- subjectsSequence[str | int], optional
A sequence of subject identifiers or (zero-based) indices to filter the files by. If empty, all subjects are included.
- sessionsSequence[str | int], optional
A sequence of session identifiers or (zero-based) indices to filter the files by. If empty, all sessions are included.
- runsSequence[str | int], optional
A sequence of run numbers or identifiers to filter the files by. If empty, all runs are included. Note that zero-based indexing does not apply to runs, unlike subjects and sessions since runs are already integers.
- tasksSequence[str] | str, optional
A sequence of task names or single task to filter the files by. If empty, all tasks are included (default is an empty sequence).
Returns#
- List[str]
A list of file paths to EEG files in the BIDS dataset.
Data Validation#
- eegprep.eeg_checkset(EEG, *checks, load_data=True)
Validate and set up EEG dataset structure.
Ensures EEG dict has required fields with correct types, computes ICA activations if possible, and loads data from file if specified.
Interactive Session#
- class eegprep.EEGPrepSession(EEG=<factory>, ALLEEG=<factory>, CURRENTSET=<factory>, ALLCOM=<factory>, LASTCOM='', STUDY=None, CURRENTSTUDY=0, PLUGINLIST=<factory>)
Bases:
objectEEGLAB-like GUI state without module globals.
- Parameters:
- LASTCOM: str = ''
- CURRENTSTUDY: int = 0
- add_change_listener(listener)
Register a callback that runs after session state changes.
- Parameters:
listener (Callable[[EEGPrepSession], None])
- Return type:
None
- remove_change_listener(listener)
Remove a previously registered session change callback.
- Parameters:
listener (Callable[[EEGPrepSession], None])
- Return type:
None
- add_command_echo_listener(listener)
Register a callback for GUI commands to display in the console.
- remove_command_echo_listener(listener)
Remove a previously registered command echo callback.
- add_gui_action_listener(listener)
Register a callback for GUI action start/end notifications.
- remove_gui_action_listener(listener)
Remove a previously registered GUI action callback.
- begin_gui_action(action)
Notify listeners that a GUI action is about to run.
- Parameters:
action (str)
- Return type:
None
- end_gui_action(action)
Notify listeners that a GUI action has finished.
- Parameters:
action (str)
- Return type:
None
- gui_action(action)
Wrap a user-triggered GUI action for console/output synchronization.
- echo_command(command)
Display a GUI command without mutating session history.
- Parameters:
command (str | None)
- Return type:
None
- notify_changed()
Notify listeners that session-backed state changed.
- Return type:
None
- selected_dataset_indices()
Return the selected EEGLAB-facing dataset indices in order.
- store_current(eeg, *, new=False, command='', mark_saved=False, index=None)
Store
eegin ALLEEG and select it.
- retrieve(indices)
Select dataset(s) from ALLEEG using 1-based indices.
- apply_workspace_state(*, eeg=<object object>, alleeg=<object object>, currentset=<object object>, allcom=<object object>, lastcom=<object object>, study=<object object>, currentstudy=<object object>, command='', append_dataset_history=False)
Apply a GUI/console workspace update as one session transaction.
- delete_current()
Delete the current dataset selection from memory.
- Return type:
None
- clear_all()
Clear all datasets and study state.
- Return type:
None
- set_study(study, alleeg=None, *, command='')
Set STUDY/CURRENTSTUDY and optionally replace loaded datasets.
- select_study(*, command='CURRENTSTUDY = 1')
Select the current STUDY set in the shared workspace.
- Parameters:
command (str)
- Return type:
None
- add_history(command, *, notify=True)
Append an EEGLAB-style command to session history.
- clear_history(*, notify=True)
Clear command history and LASTCOM as one session mutation.
- Parameters:
notify (bool)
- Return type:
None
- remove_history(count, *, notify=True)
Remove the most recent
countcommand-history entries.
- history_command_at(index)
Return the 1-based command from most recent history first.
- clear_last_command(*, notify=True)
Clear LASTCOM without deleting ALLCOM.
- Parameters:
notify (bool)
- Return type:
None
- mark_current_saved()
Mark the current dataset selection as saved in EEG and ALLEEG.
- Return type:
None
- dataset_summaries()
Return
(index, label, selected)tuples for the Datasets menu.
- class eegprep.EEGPrepConsoleWorkspace(session, *, window=None, refresh=None, command_echo=None, exports=None, extension_runtime=None)
Bases:
objectSynchronize an IPython namespace with an
EEGPrepSession.- Parameters:
session (EEGPrepSession)
window (Any | None)
refresh (Callable[[], None] | None)
command_echo (Callable[[str], Any] | None)
exports (Mapping[str, Any] | None)
extension_runtime (ExtensionRuntime | None)
- close()
Detach this workspace from session notifications.
- Return type:
None
- pull_from_session()
Mirror session state into the console namespace.
- Return type:
None
- after_execute(source, *, success=True)
Push console-side workspace edits back into the session.
- accept_pop_result(result, args, kwargs=None)
Store a
pop_*result in the current session when appropriate.
- pop_wrapper(name)
Return the console-aware wrapper for a public
pop_*function.- Parameters:
name (str)
- Return type:
ConsolePopFunction
- execute_history_command(command)
Execute an EEGLAB history command through the console namespace.
- Parameters:
command (str)
- Return type:
None
- eegprep.plugin_menu(pluginlist=None, *, parent=None, session=None, show=True, registry=None, catalog=None, catalog_path=None, include_bundled=True, include_entry_points=True, disabled_extensions=None)
Show or return the EEGPrep Extension Manager inventory.
- Parameters:
pluginlist (list[dict[str, Any]] | tuple[dict[str, Any], ...] | None) – Optional extension inventory to display. Defaults to the extension registry merged with the curated metadata catalog.
parent (Any | None) – Optional Qt parent widget for the dialog.
session (Any | None) – Optional
EEGPrepSession; itsPLUGINLISTmirror is updated with the displayed inventory.show (bool) – Show the Qt dialog when
True. UseFalsefor scripts, examples, tests, or console inventory checks.registry (ExtensionRegistry | None) – Optional discovered registry for tests or explicit control.
catalog (ExtensionCatalog | None) – Optional loaded catalog. Defaults to the packaged/local catalog.
catalog_path (str | None) – Optional JSON catalog path.
include_bundled (bool) – Include bundled EEGPrep plugin ports in default discovery.
include_entry_points (bool) – Include installed entry-point extensions in default discovery.
disabled_extensions (set[str] | list[str] | tuple[str, ...] | None) – Registry names to mark disabled during default discovery.
- Returns:
The normalized extension inventory as a mutable list of dictionaries. Records include install/update command strings but never execute them.
- Return type:
- eegprep.plugin_status(pluginname, *, exactmatch=False, pluginlist=None, registry=None, catalog=None, catalog_path=None, include_bundled=True, include_entry_points=True, disabled_extensions=None)
Return EEGLAB-style installed status for EEGPrep extensions.
- Parameters:
pluginname (str) – Plugin or extension name, package name, or substring to search.
exactmatch (bool) – Require exact case-insensitive name matching.
pluginlist (list[dict[str, Any]] | tuple[dict[str, Any], ...] | None) – Optional precomputed extension inventory. Defaults to the registry plus the curated catalog.
registry (ExtensionRegistry | None) – Optional discovered registry for tests or callers that need explicit discovery control.
catalog (ExtensionCatalog | None) – Optional loaded catalog. Defaults to the packaged/local catalog.
catalog_path (str | None) – Optional JSON catalog path.
include_bundled (bool) – Include bundled EEGPrep plugin ports in default discovery.
include_entry_points (bool) – Include installed entry-point extensions in default discovery.
disabled_extensions (set[str] | list[str] | tuple[str, ...] | None) – Registry names to mark disabled during default discovery.
- Returns:
A tuple
(status, names, pluginstruct)where status values are1for active installed/bundled extensions and0for curated-only, disabled, incompatible, failed, or missing-dependency matches.- Return type:
Object-Oriented Interface#
- class eegprep.EEGobj(EEG_or_path)
Bases:
objectWrapper class for EEG datasets stored as dictionaries.
Provides attribute access to EEG fields and method calls to eegprep functions.
- __init__(EEG_or_path)
Initialize from an EEG dict or a file path string.
If string: loads dataset with pop_loadset(path).
If dict: uses it directly.
- __getattr__(name)
Access EEG fields or eegprep functions.
If ‘name’ is a key in EEG, return EEG[name] (convenience).
If ‘name’ resolves to a function in eegprep, return a wrapper that: self.EEG = func(deepcopy(self.EEG), …) and returns updated EEG for convenience.
Otherwise raise AttributeError so field-name typos fail fast instead of silently returning a no-op callable.
- __setattr__(name, value)
Set attributes on the underlying EEG dict when possible, else on the wrapper.
- __repr__()
Multi-line, MNE-like summary of the EEG object.
Shows key metadata, data shape, sampling info, time span, and brief events/channels info.
- __str__()
Multi-line, MNE-like summary of the EEG object.
Shows key metadata, data shape, sampling info, time span, and brief events/channels info.