Preprocessing Pipeline#
This guide provides a comprehensive overview of the eegprep preprocessing pipeline, including the order of operations, parameter tuning, and quality control.
Pipeline Overview#
The eegprep preprocessing pipeline is designed to systematically clean and prepare raw EEG data for analysis. The pipeline removes artifacts, interpolates bad channels, resamples data, applies filtering, performs ICA decomposition, and classifies independent components.
Key Features:
Automated artifact detection and removal: Identifies and removes noisy channels and time windows
Flexible component classification: Uses ICLabel for automatic ICA component classification
Customizable parameters: Adjust thresholds and methods for your specific needs
Quality control: Built-in checks and visualizations to assess preprocessing quality
Batch processing: Process multiple subjects efficiently with
eegprep.bids_preproc()
Pipeline Steps#
The preprocessing pipeline follows these steps in order:
Channel Selection
Artifact Removal (ASR and clean_artifacts)
Channel Interpolation
Resampling
Filtering
ICA Decomposition
Component Classification (ICLabel)
Step 1: Channel Selection#
Select the channels to include in preprocessing:
from eegprep import pop_select
# Select only EEG channels
eeg = pop_select(eeg, 'type', 'EEG')
# Select specific channels by name
eeg = pop_select(eeg, 'channel', ['Cz', 'Pz', 'Oz', 'Fz'])
# Remove specific channels
eeg = pop_select(eeg, 'nochannel', ['HEOG', 'VEOG'])
When to use: Always perform channel selection first to ensure you’re working with the correct data.
Step 2: Artifact Removal#
Remove noisy channels and time windows using multiple methods:
Flatline Detection#
Remove channels with no variation (dead channels):
from eegprep import clean_flatlines
eeg = clean_flatlines(
eeg,
flatline_criterion=5 # Flatline duration in seconds
)
Noisy Channel Removal#
Remove channels with excessive noise:
from eegprep import clean_channels
eeg = clean_channels(
eeg,
ransac_criterion=0.8, # RANSAC correlation threshold
max_broken_time=0.5 # Max proportion of broken time
)
Artifact Subspace Reconstruction (ASR)#
Remove bursts of high-amplitude artifacts:
from eegprep import clean_asr
eeg = clean_asr(
eeg,
asr_criterion=20, # Standard deviation threshold
asr_wlen=0.5 # Window length in seconds
)
Comprehensive Artifact Removal#
Use the all-in-one eegprep.clean_artifacts() function:
from eegprep import clean_artifacts
eeg = clean_artifacts(
eeg,
flatline_criterion=5,
highpass=1,
lowpass=100,
asr_criterion=20,
asr_wlen=0.5,
remove_channels=True,
remove_windows=True
)
Parameters:
flatline_criterion: Duration (seconds) of flatline to detect (default: 5)asr_criterion: Standard deviation threshold for ASR (default: 20)asr_wlen: Window length for ASR in seconds (default: 0.5)highpass: High-pass filter frequency in Hz (default: 1)lowpass: Low-pass filter frequency in Hz (default: 100)
Step 3: Channel Interpolation#
Interpolate removed channels using spherical spline interpolation:
from eegprep import eeg_interp
# Interpolate removed channels
eeg = eeg_interp(eeg)
# Interpolate specific channels
eeg = eeg_interp(eeg, channels=[1, 5, 10])
When to use: After removing noisy channels, interpolate them to maintain spatial coverage.
Step 4: Resampling#
Resample data to a lower sampling rate to reduce file size and computation:
from eegprep import pop_resample
# Resample to 250 Hz
eeg = pop_resample(eeg, 250)
# Resample to 500 Hz
eeg = pop_resample(eeg, 500)
Common sampling rates:
250 Hz: Standard for most EEG analysis
500 Hz: Higher resolution for detailed analysis
100 Hz: Lower resolution for quick analysis
When to use: Resample early in the pipeline to reduce computation time for subsequent steps.
Step 5: Filtering#
Apply frequency filtering to remove noise outside the frequency band of interest:
High-Pass Filtering#
Remove slow drifts and DC offset:
from eegprep import pop_eegfiltnew
# High-pass filter at 1 Hz
eeg = pop_eegfiltnew(eeg, locutoff=1)
Low-Pass Filtering#
Remove high-frequency noise:
# Low-pass filter at 100 Hz
eeg = pop_eegfiltnew(eeg, hicutoff=100)
Band-Pass Filtering#
Apply both high-pass and low-pass filters:
# Band-pass filter 1-100 Hz
eeg = pop_eegfiltnew(eeg, locutoff=1, hicutoff=100)
Common filter settings:
Resting state: 1-100 Hz
Event-related potentials (ERP): 0.1-30 Hz
Oscillatory analysis: 1-100 Hz
High-frequency activity: 1-200 Hz
When to use: Apply filtering after resampling but before ICA for best results.
Step 6: ICA Decomposition#
Decompose the data into independent components:
Using Picard Algorithm#
Fast and reliable ICA decomposition:
from eegprep import eeg_picard
eeg = eeg_picard(
eeg,
ncomps=None, # Number of components (None = number of channels)
max_iter=500 # Maximum iterations
)
Using Extended Infomax ICA#
Alternative ICA algorithm:
from eegprep import eeg_picard
# Picard is recommended, but you can adjust parameters
eeg = eeg_picard(eeg, ncomps=eeg.nbchan)
Parameters:
ncomps: Number of components to extract (default: number of channels)max_iter: Maximum iterations (default: 500)
When to use: After filtering, before component classification.
Step 7: Component Classification (ICLabel)#
Automatically classify ICA components using ICLabel:
from eegprep import iclabel
eeg = iclabel(eeg)
# Access component labels
print(eeg.etc.ic_classification.ICLabel.classes)
print(eeg.etc.ic_classification.ICLabel.classifications)
Component types:
Brain: Neural activity
Muscle: Muscle artifacts
Eye: Eye movement artifacts
Heart: Cardiac artifacts
Line Noise: 50/60 Hz noise
Channel Noise: Noisy channels
Other: Unclassified
Removing artifact components:
# Remove muscle and eye components
artifact_components = []
for i, label in enumerate(eeg.etc.ic_classification.ICLabel.classes):
if label in ['Muscle', 'Eye']:
artifact_components.append(i)
# Remove components
eeg.icaact = None # Clear cached ICA activity
eeg = pop_select(eeg, 'nochannel', artifact_components)
Pipeline Visualization#
Here’s a text-based flowchart of the preprocessing pipeline:
Raw EEG Data
|
v
Channel Selection
|
v
Flatline Detection
|
v
Noisy Channel Removal
|
v
Channel Interpolation
|
v
Resampling
|
v
High-Pass Filtering
|
v
Low-Pass Filtering
|
v
ICA Decomposition
|
v
Component Classification (ICLabel)
|
v
Artifact Component Removal
|
v
Preprocessed EEG Data
Parameter Tuning#
Key Parameters and Their Effects#
Flatline Criterion
Default: 5 seconds
Lower values: More aggressive channel removal
Higher values: More lenient, may keep noisy channels
Recommendation: 5 seconds for most applications
ASR Criterion
Default: 20 (standard deviations)
Lower values: More aggressive artifact removal
Higher values: More lenient, may keep artifacts
Recommendation: 20 for standard EEG, 15-25 for sensitive applications
High-Pass Filter
Default: 1 Hz
Lower values: Preserve slow oscillations
Higher values: Remove more low-frequency noise
Recommendation: 0.5-1 Hz for most applications
Low-Pass Filter
Default: 100 Hz
Lower values: Remove more high-frequency noise
Higher values: Preserve high-frequency activity
Recommendation: 100 Hz for standard EEG, 200 Hz for high-frequency analysis
Resampling Rate
Default: 250 Hz
Lower values: Smaller file size, faster processing
Higher values: Better temporal resolution
Recommendation: 250 Hz for most applications
Tuning Strategy#
Start with defaults: Use the default parameters as a baseline
Visualize results: Plot the data before and after preprocessing
Adjust parameters: Modify parameters based on visual inspection
Validate: Check that preprocessing doesn’t remove important signals
Document: Record the parameters used for reproducibility
Quality Control#
Assessing Preprocessing Quality#
Visual Inspection#
Plot the data before and after preprocessing:
import matplotlib.pyplot as plt
# Plot raw data
plt.figure(figsize=(12, 6))
plt.plot(eeg.data[0, :1000])
plt.title('Raw EEG Data')
plt.show()
# Plot preprocessed data
plt.figure(figsize=(12, 6))
plt.plot(eeg.data[0, :1000])
plt.title('Preprocessed EEG Data')
plt.show()
Spectral Analysis#
Compare power spectral density before and after preprocessing:
from eegprep import eeg_rpsd
import matplotlib.pyplot as plt
# Compute power spectral density
psd = eeg_rpsd(eeg)
plt.figure(figsize=(12, 6))
plt.semilogy(psd)
plt.xlabel('Frequency (Hz)')
plt.ylabel('Power (µV²/Hz)')
plt.title('Power Spectral Density')
plt.show()
Component Inspection#
Visualize ICA components:
from eegprep import topoplot
import matplotlib.pyplot as plt
# Plot component topographies
topoplot(eeg, components=[0, 1, 2, 3])
plt.title('ICA Component Topographies')
plt.show()
# Check component classifications
if hasattr(eeg, 'etc') and 'ic_classification' in eeg.etc:
classifications = eeg.etc.ic_classification.ICLabel.classifications
for i, probs in enumerate(classifications):
print(f"Component {i}: {probs}")
Data Loss Assessment#
Check how much data was removed:
# Check removed channels
if hasattr(eeg, 'removed_channels'):
print(f"Removed channels: {eeg.removed_channels}")
# Check removed windows
if hasattr(eeg, 'removed_windows'):
print(f"Removed windows: {eeg.removed_windows}")
# Calculate percentage of data retained
if hasattr(eeg, 'removed_windows'):
pct_retained = (1 - len(eeg.removed_windows) / eeg.pnts) * 100
print(f"Data retained: {pct_retained:.1f}%")
Quality Metrics#
Compute quality metrics:
# Signal-to-noise ratio
from eegprep import eeg_rpsd
psd = eeg_rpsd(eeg)
snr = psd[1:50].mean() / psd[50:100].mean()
print(f"SNR: {snr:.2f}")
# Autocorrelation
from eegprep import eeg_autocorr
acf = eeg_autocorr(eeg, maxlag=100)
print(f"Autocorrelation: {acf}")
Common Issues and Solutions#
Too Many Channels Removed#
Problem: Preprocessing removes too many channels
Solutions:
Increase flatline criterion
Increase ASR criterion
Check data quality before preprocessing
Verify channel locations are correct
Too Few Artifacts Removed#
Problem: Preprocessing doesn’t remove enough artifacts
Solutions:
Decrease ASR criterion
Decrease flatline criterion
Apply additional filtering
Manually inspect and remove bad components
ICA Fails to Converge#
Problem: ICA decomposition doesn’t converge
Solutions:
Increase max_iter parameter
Ensure data is properly filtered
Check for remaining artifacts
Try different ICA algorithm
Next Steps#
Now that you understand the preprocessing pipeline:
Read the Configuration guide for advanced parameter tuning
Explore the BIDS Workflow for batch processing
Check the Advanced Topics for custom pipelines
Review the API Reference for detailed function documentation