Preprocessing Functions#

This section documents the preprocessing functions for artifact removal, channel operations, and signal processing.

Artifact Removal#

eegprep.clean_artifacts(EEG, ChannelCriterion=0.8, LineNoiseCriterion=4.0, BurstCriterion=5.0, WindowCriterion=0.25, Highpass=(0.25, 0.75), ChannelCriterionMaxBadTime=0.5, BurstCriterionRefMaxBadChns=0.075, BurstCriterionRefTolerances=(-inf, 5.5), BurstRejection=False, WindowCriterionTolerances=(-inf, 7), FlatlineCriterion=5.0, NumSamples=50, SubsetSize=0.25, NoLocsChannelCriterion=0.45, NoLocsChannelCriterionExcluded=0.1, MaxMem=64, Distance='euclidian', Channels=None, Channels_ignore=None, availableRAM_GB=None)#

All-in-one artifact removal, port of MATLAB clean_artifacts.

Removes flatline channels, low-frequency drifts, noisy channels, short-time bursts, and irrecoverable windows in sequence. Core parameters can be passed as None or ‘off’ to use defaults or disable stages.

Return type:

Tuple[Dict[str, Any], Dict[str, Any], Dict[str, Any], ndarray]

Parameters:

Parameters#

EEGdict

Raw continuous EEG dataset dict (must include ‘data’, ‘srate’, ‘chanlocs’, etc.).

ChannelCriterionfloat or ‘off’

Minimum channel correlation threshold for channel cleaning; channels below this value are considered bad. Pass ‘off’ to skip channel criterion. Default 0.8.

LineNoiseCriterionfloat or ‘off’

Z-score threshold for line-noise contamination; channels exceeding this are considered bad. ‘off’ disables line-noise check. Default 4.0.

BurstCriterionfloat or ‘off’

ASR standard-deviation cutoff for high-amplitude bursts; values above this relative to calibration data are repaired (or removed if BurstRejection=’on’). ‘off’ skips ASR. Default 5.0.

WindowCriterionfloat or ‘off’

Fraction (0-1) or count of channels allowed to be bad per window; windows with more bad channels are removed. ‘off’ disables final window removal. Default 0.25.

Highpasstuple(float, float) or ‘off’

Transition band [low, high] in Hz for initial high-pass filtering. ‘off’ skips drift removal. Default (0.25, 0.75).

ChannelCriterionMaxBadTimefloat

Maximum tolerated time (seconds or fraction of recording) a channel may be flagged bad before being removed. Default 0.5.

BurstCriterionRefMaxBadChnsfloat or ‘off’

Maximum fraction of bad channels tolerated when selecting calibration data for ASR. ‘off’ uses all data for calibration. Default 0.075.

BurstCriterionRefTolerancestuple(float, float) or ‘off’

Power Z-score tolerances for selecting calibration windows in ASR. ‘off’ uses all data. Default (-inf, 5.5).

BurstRejectionbool

‘on’ to reject (drop) burst segments instead of reconstructing with ASR, ‘off’ to apply ASR repair. Default ‘off’.

WindowCriterionTolerancestuple(float, float) or ‘off’

Power Z-score bounds for final window removal. ‘off’ disables this stage. Default (-inf, 7).

FlatlineCriterionfloat or ‘off’

Maximum flatline duration in seconds; channels exceeding this are removed. ‘off’ disables flatline removal. Default 5.0.

NumSamplesint

Number of RANSAC samples for channel cleaning. Default 50.

SubsetSizefloat

Size of channel subsets for RANSAC, as fraction (0-1) or count. Default 0.25.

NoLocsChannelCriterionfloat

Correlation threshold for fallback channel cleaning when no channel locations. Default 0.45.

NoLocsChannelCriterionExcludedfloat

Fraction of channels excluded when assessing correlation in nolocs cleaning. Default 0.1.

MaxMemint

Maximum memory in MB for ASR processing. Default 64.

Distancestr

Distance metric for ASR processing (‘euclidian’). Default ‘euclidian’.

Channelssequence of str or None

List of channel labels to include before cleaning (pop_select). Default None.

Channels_ignoresequence of str or None

List of channel labels to exclude before cleaning. Default None.

availableRAM_GBfloat or None

Available system RAM in GB to adjust MaxMem. Default None.

Returns#

EEGdict

Final cleaned EEG dataset.

HPdict

EEG dataset after initial high-pass (drift removal).

BURdict

EEG dataset after ASR burst repair (before final window removal).

removed_channelsndarray of bool

Mask indicating which channels were removed during cleaning.

eegprep.clean_asr(EEG, cutoff=5.0, window_len=None, step_size=None, max_dims=0.66, ref_maxbadchannels=0.075, ref_tolerances=(-3.5, 5.5), ref_wndlen=1.0, use_gpu=False, useriemannian=None, maxmem=64)#

Run the Artifact Subspace Reconstruction (ASR) method on EEG data.

This is an automated artifact rejection function that ensures that the data contains no events that have abnormally strong power; the subspaces on which those events occur are reconstructed (interpolated) based on the rest of the EEG signal during these time periods.

Parameters:
  • EEG (Dict[str, Any]) – EEG data structure. Expected fields: ‘data’ (np.ndarray): Channels x Samples matrix. ‘srate’ (float): Sampling rate in Hz. ‘nbchan’ (int): Number of channels. It’s assumed the data is zero-mean (e.g., high-pass filtered).

  • cutoff (float, optional) – Standard deviation cutoff for rejection. Data portions whose variance is larger than this threshold relative to the calibration data are considered artifactual and removed. Aggressive: 3, Default: 5, Conservative: 20.

  • window_len (float, optional) – Length of the statistics window in seconds. Should not be much longer than artifact timescale. Samples in window should be >= 1.5x channels. Default: max(0.5, 1.5 * nbchan / srate).

  • step_size (int, optional) – Step size for processing in samples. Reconstruction matrix updated every step_size samples. If None, defaults to window_len / 2 samples.

  • max_dims (float, optional) – Maximum dimensionality/fraction of dimensions to reconstruct. Default: 0.66.

  • ref_maxbadchannels (Union[float, str, np.ndarray], optional) – Parameter for automatic calibration data selection. float: Max fraction (0-1) of bad channels tolerated in a window for it to be used as calibration data. Lower is more aggressive (e.g., 0.05). Default: 0.075. ‘off’: Use all data for calibration. Assumes artifact contamination < ~30-50%. np.ndarray: Directly provides the calibration data (channels x samples).

  • ref_tolerances (Union[Tuple[float, float], str], optional) – Power tolerances (lower, upper) in SDs from robust EEG power for a channel to be considered ‘bad’ during calibration data selection. Default: (-3.5, 5.5). Use ‘off’ to disable.

  • ref_wndlen (Union[float, str], optional) – Window length in seconds for calibration data selection granularity. Default: 1.0. Use ‘off’ to disable.

  • use_gpu (bool, optional) – Whether to try using GPU (requires compatible hardware and libraries, currently ignored). Default: False.

  • useriemannian (str, optional) – Option to use a Riemannian ASR variant. Can be set to ‘calib’ to use a Riemannian estimate at calibration time; this make somewhat different statistical tradeoffs than the default, resulting in a somewhat different baseline rejection threshold; as a result it is suggested to visually check results and adjust the cutoff as needed. Default: None (disabled).

  • maxmem (Optional[int], optional) – Maximum memory in MB (passed to asr_calibrate/process, but chunking based on it is not implemented in Python port). Default: 64.

Return type:

Dict[str, Any]

Returns#

Dict[str, Any] : The EEG dictionary with the ‘data’ field containing the cleaned data.

Raises#

NotImplementedError : If useriemannian is True. ImportError : If automatic calibration data selection is needed (ref_maxbadchannels is float) but clean_windows cannot be imported. ValueError : If input arguments are invalid or calibration fails critically.

eegprep.clean_flatlines(EEG, max_flatline_duration=5.0, max_allowed_jitter=20.0)#

Remove (near-) flat-lined channels.

This is an automated artifact rejection function which ensures that the data contains no flat-lined channels.

Parameters:
  • EEG (Dict[str, Any]) – the continuous-time EEG data structure

  • max_flatline_duration (float) – maximum tolerated flatline duration. In seconds. If a channel has a longer flatline than this, it will be considered abnormal.

  • max_allowed_jitter (float) – maximum tolerated jitter during flatlines. As a multiple of epsilon.

Returns#

EEG : the EEG data structure with flatlined channels removed.

Example

EEG = clean_flatlines(EEG)

eegprep.clean_drifts(EEG, transition=(0.5, 1), attenuation=80.0, method='fft')#

Remove drifts from the data using a forward-backward high-pass filter.

This removes drifts from the data using a forward-backward (non-causal) filter. NOTE: If you are doing directed information flow analysis, do no use this filter but some other one.

Parameters:
  • EEG (Dict[str, Any]) – the continuous-time EEG data structure

  • transition (Sequence[float]) – the transition band in Hz, i.e. lower and upper edge of the transition as in (lo,hi)

  • attenuation (float) – stop-band attenuation, in dB

  • method (str) – the method to use for filtering (‘fft’ or ‘fir’)

Return type:

Dict[str, Any]

Returns#

EEG : the filtered EEG data structure

eegprep.clean_windows(EEG, max_bad_channels=0.2, zthresholds=(-3.5, 5), window_len=1.0, window_overlap=0.66, max_dropout_fraction=0.1, min_clean_fraction=0.25, truncate_quant=(0.022, 0.6), step_sizes=(0.01, 0.01), shape_range=array([1.7, 1.85, 2., 2.15, 2.3, 2.45, 2.6, 2.75, 2.9, 3.05, 3.2, 3.35, 3.5]))#

Remove periods with abnormally high-power content from continuous data.

This function cuts segments from the data which contain high-power artifacts. Specifically, only windows are retained which have less than a certain fraction of bad channels, where a channel is bad in a window if its RMS power is above or below some z-threshold relative to a robust estimate of clean EEG power in that channel.

Return type:

Tuple[Dict[str, Any], ndarray]

Parameters:

Args#

EEGdict

Continuous dataset using the EEGLAB dict schema. The data is expected to be high-passed appropriately (>1 Hz recommended).

max_bad_channelsint | float

The maximum number or fraction of channels that may exceed the thresholds inside a time-window for the window to be kept. Values in (0,1) are interpreted as a fraction; otherwise as an absolute count.

zthresholdstuple(float, float)

Lower and upper z-score limits for RMS power ([low, high]).

window_lenfloat

Window length in seconds. Should be at least half a period of the high- pass cut-off that was used. Default is 1 s.

window_overlapfloat

Fractional overlap between consecutive windows (0-1). Higher overlap finds more artefacts but is slower. Default is 0.66 (≈⅔ overlap).

max_dropout_fractionfloat

Maximum fraction of windows that may have arbitrarily low amplitude (e.g. sensor unplugged). Default is 0.1.

min_clean_fractionfloat

Minimum fraction of windows expected to be clean (essentially uncontaminated EEG). Default is 0.25.

truncate_quanttuple(float, float)

Quantile range of the truncated Gaussian to fit (default (0.022,0.6)).

step_sizestuple(float, float)

Grid-search step sizes in quantiles for lower/upper edge.

shape_rangesequence(float)

Range for the beta shape parameter in the generalised Gaussian used for distribution fitting.

Returns#

EEGdict

The passed-in structure with bad time periods excised.

sample_masknp.ndarray[bool]

Boolean mask (length == original pnts) indicating which samples are retained (True) or removed (False).

Channel Operations#

eegprep.clean_channels(EEG, corr_threshold=0.8, noise_threshold=5.0, window_len=5, max_broken_time=0.4, num_samples=50, subset_size=0.25)#

Remove channels with problematic data from a continuous data set.

This is an automated artifact rejection function which ensures that the data contains no channels that record only noise for extended periods of time. If channels with control signals are contained in the data these are usually also removed. The criterion is based on correlation: if a channel has lower correlation to its robust estimate (based on other channels) than a given threshold for a minimum period of time (or percentage of the recording), it will be removed.

Parameters:
  • EEG (Dict[str, Any]) – Continuous data set, assumed to be appropriately high-passed (e.g. >0.5Hz or with a 0.5Hz - 2.0Hz transition band).

  • corr_threshold (float) – Correlation threshold. If a channel is correlated at less than this value to its robust estimate (based on other channels), it is considered abnormal in the given time window.

  • noise_threshold (float) – If a channel has more (high-frequency) noise relative to its signal than this value, in standard deviations from the channel population mean, it is considered abnormal.

  • window_len (float) – Length of the windows (in seconds) for which correlation is computed; ideally short enough to reasonably capture periods of global artifacts or intermittent sensor dropouts, but not shorter (for statistical reasons).

  • max_broken_time (float) – Maximum time (either in seconds or as fraction of the recording) during which a channel is allowed to have artifacts. Reasonable range: 0.1 (very aggressive) to 0.6 very lax).

  • num_samples (int) – Number of samples generated for a RANSAC reconstruction. This is the number of samples to generate in the random sampling consensus process. The larger this value, the more robust but also slower the processing will be.

  • subset_size (float) – Subset size. This is the size of the channel subsets to use for robust reconstruction, as a number or fraction of the total number of channels.

Return type:

Dict[str, Any]

Returns#

EEG : data set with bad channels removed

eegprep.clean_channels_nolocs(EEG, min_corr=0.45, ignored_quantile=0.1, window_len=2.0, max_broken_time=0.5, linenoise_aware=True)#

Remove channels with abnormal data from a continuous data set.

This is an automated artifact rejection function which ensures that the data contains no channels that record only noise for extended periods of time. If channels with control signals are contained in the data these are usually also removed. The criterion is based on correlation: if a channel is decorrelated from all others (pairwise correlation < a given threshold), excluding a given fraction of most correlated channels – and if this holds on for a sufficiently long fraction of the data set – then the channel is removed.

Parameters:
  • EEG (Dict[str, Any]) – Continuous data set, assumed to be appropriately high-passed (e.g. >0.5Hz or with a 0.5Hz - 2.0Hz transition band).

  • min_corr (float) – Minimum correlation between a channel and any other channel (in a short period of time) below which the channel is considered abnormal for that time period. Reasonable range: 0.4 (very lax) to 0.6 (quite aggressive).

  • ignored_quantile (float) – Fraction of channels that need to have at least the given MinCorrelation value w.r.t. the channel under consideration. This allows to deal with channels or small groups of channels that measure the same noise source. Reasonable range: 0.05 (rather lax) to 0.2 (very tolerant re disconnected/shorted channels).

  • window_len (float) – Length of the windows (in seconds) for which correlation is computed.

  • max_broken_time (float) – Maximum time (either in seconds or as fraction of the recording) during which a retained channel may be broken. Reasonable range: 0.1 (very aggressive) to 0.6 (very lax).

  • linenoise_aware (bool) – Whether the operation should be performed in a line-noise aware manner. If enabled, the correlation measure will not be affected by the presence or absence of line noise (using a temporary notch filter).

Return type:

Tuple[Dict[str, Any], ndarray]

Returns#

EEG : data set with bad channels removed removed_channels : boolean array indicating which channels were removed

eegprep.eeg_interp(EEG, bad_chans, method='spherical', t_range=None, params=None, dtype='float32')#

Interpolate missing or bad EEG channels using spherical spline.

interpolation.

Parameters#

EEGdict

EEG data structure with ‘data’, ‘chanlocs’, ‘nbchan’, etc.

bad_chanslist, array-like, or list of dicts

Can be one of: - List of channel names (strings): e.g., [‘Fp1’, ‘Fp2’] - List of channel indices (integers): e.g., [0, 1, 2] - List of chanloc structures (dicts): e.g., [{‘labels’: ‘T7’, ‘X’: 0.8, ‘Y’: 0.0, ‘Z’: 0.6}, …]

When chanloc structures are provided, the function supports three modes: 1. If chanlocs are identical to EEG[‘chanlocs’], returns data unchanged 2. If no overlap with existing channels, appends new channels and interpolates them 3. If existing channels are a subset, remaps data to new channel structure

methodstr, optional

Interpolation method (‘spherical’, ‘sphericalKang’, ‘sphericalCRD’, ‘sphericalfast’)

t_rangetuple, optional

Time range for interpolation

paramstuple, optional

Method-specific parameters

dtype: str | dtype, optional

Optionally the precision in which to perform the computation; * ‘float32’ : matches MATLAB, but limits precision (default) * ‘float64’: operate at full precision; requires twice the memory

Returns#

EEGdict

Updated EEG structure with interpolated channels

Signal Processing#

eegprep.pop_resample(EEG, freq, engine=None)#

Resample EEG data to a new sampling rate.

Parameters#

EEGdict

EEGLAB EEG structure.

freqfloat

New sampling rate in Hz.

enginestr or None

Engine to use for implementation. Options are: - None: Use the default Python implementation - ‘poly’: Use scipy’s resample_poly function - ‘matlab’: Use MATLAB engine - ‘octave’: Use Octave engine

Returns#

EEGdict

EEGLAB EEG structure with resampled data.

eegprep.pop_rmbase(EEG, timerange=None, pointrange=None, chanlist=None)#

POP_RMBASE - remove channel baseline means from an epoched or continuous EEG dataset.

Return type:

dict

Parameters:

Parameters#

EEGdict

EEGLAB-like EEG structure with keys: ‘data’, ‘nbchan’, ‘pnts’, ‘trials’, ‘times’, ‘event’. Event latencies are 1-based indices (EEGLAB convention).

timerange[min_ms, max_ms] or None

Baseline latency range in milliseconds; overrides pointrange when provided.

pointrangeiterable of indices or [start, end]

Baseline sample indices (0-based or 1-based tolerated). If None/empty, use whole epoch.

chanlist : iterable of channel indices (0-based). If None, all channels are used.

Returns#

EEGdict

Updated EEG structure with baseline removed. EEG[‘icaact’] is cleared.