Large-Dataset Storage#

EEGPrep keeps normal EEG dictionaries as the public API while supporting large-dataset workflows through explicit Python storage handles. The runtime does not depend on EEGLAB’s MATLAB @memmapdata or @mmo classes.

Two-File .set / .fdt Datasets#

pop_saveset saves one-file .set files by default. To write a two-file .set header plus float32 data sidecar, pass savemode="twofiles" or set EEG_OPTIONS["option_savetwofiles"] = 1:

from eegprep import EEG_OPTIONS, pop_loadset, pop_saveset

pop_saveset(EEG, "subject01.set", savemode="twofiles")

EEG_OPTIONS["option_savetwofiles"] = 1
pop_saveset(EEG, "subject02.set")

The .fdt sidecar uses EEGLAB’s channel-fast float32 layout. Continuous data round-trips as (nbchan, pnts) and epoched data round-trips as (nbchan, pnts, trials).

When an existing two-file dataset is saved with savemode="resave", EEGPrep keeps writing the same .fdt sidecar. A plain save without savemode follows the current option_savetwofiles setting; if that option is disabled, the data is saved inline in the .set file.

Memory-Mapped Data#

When EEG_OPTIONS["option_memmapdata"] = 1, pop_loadset loads two-file datasets through a NumPy-compatible MemmapData handle instead of copying the full sidecar into memory:

EEG_OPTIONS["option_memmapdata"] = 1
EEG = pop_loadset("subject01.set")

first_channel = EEG["data"][0, :]
EEG["data"][0, 0] = 0
EEG["data"].flush()

Single-file .set datasets still load as in-memory NumPy arrays because no separate data file exists to map. Mutating a MemmapData value writes to the .fdt sidecar; use normal EEGPrep save/history workflows when the dataset metadata should be marked clean.

Storedisk Sessions#

EEG_OPTIONS["option_storedisk"] = 1 keeps the current selected dataset resident and evicts saved non-current datasets from ALLEEG. Evicted datasets hold an OffloadedData handle with their saved .set path and shape metadata. Accessing samples through that handle raises a clear error; retrieve the dataset first:

EEG_OPTIONS["option_storedisk"] = 1
ALLEEG, EEG, CURRENTSET = eeg_store(ALLEEG, EEG, 0)
EEG, ALLEEG, CURRENTSET = eeg_retrieve(ALLEEG, 1)

The GUI, EEGPrepSession, and eegprep-console use the same eeg_store/eeg_retrieve path, so EEG, ALLEEG, CURRENTSET, history, and dataset menus stay synchronized. Unsaved resident datasets cannot be offloaded; save them first or keep option_storedisk disabled.

Current Limitations#

pop_loadset supports full dataset loading for Phase 5. EEGLAB channel-only and loadmode="info" paths fail clearly instead of pretending data is available. Derived caches such as icaact are not managed by a separate lazy-storage layer, and EEGPrep does not provide multi-process write coordination for shared .fdt files.