Overview of EDF specifications
This section presents a brief overview of the EDF specifications. This overview should help understand how to use the various functions and classes in this edfrw library. The full EDF specifications can be found here
EDF files consist of a header (ascii) that describes the contents of the file and the experimental settings. The data (int16) are stored after the header in data records.
The Header
The first 256 bytes in an EDF file are the header, which contains
information about the patient, date and time of data acquisition, etc.
This is followed by another 256 bytes for each signal acquired. Signal
header(s) contain the details about the name of the signals, the
hardware used, and values to allow the transformation of raw (int16)
data values into physical values (e.g. volts). Thus, the length of the
full header (i.e. the ‘header record’) equals 256 + (number_of_signals *
256). The header record is ascii
only, and contains the following
fields:
Header record
Field | Size | Position | Notes |
---|---|---|---|
version | 8 | 0 | [1] |
patient_id | 80 | 8 | [2] |
recording_id | 80 | 88 | [3] |
startdate | 8 | 168 | dd.mm.yy |
starttime | 8 | 176 | hh.mm.ss |
number_of_bytes_in_header | 8 | 184 | |
reserved | 44 | 192 | [4] |
number_of_data_records | 8 | 236 | ‘nr’ |
duration_of_data_record | 8 | 244 | in seconds |
number_of_signals | 4 | 252 | ‘ns’ |
(total) | 256 |
Notes
-
‘version’ is always ‘0’.
-
‘patient_id’ must consist of 4 space-separated strings: Code Sex DOB Name, where
- Code is the patient code
- Sex is M, F, or X
- DOB is date of birth in format dd-MMM-yyyy
- Name is patient’s name
- (If a subfield is not known, replace with an X)
- ‘recording_id’ is a string
'Startdate dd-MMM-yyyy ExpID InvestigID Equipment'
, where
- The text “Startdate”
- dd-MMM-yyyy, the actual start date
- ExpID, code of the experiment/investigation
- InvestigID, code of responsible investigator
- Equipment, code of equipment used
- Additional optional subfields may follow the ones above
- Example: ‘Startdate 02-MAR-2002 PSG-1234=2002 NN Telemetry03’
- ‘reserved’: empty for EDF; ‘EDF+C’ for continuous recording; ‘EDF+D’ if the recording is interrupted.
Signal record
After the main header there is information about each signal acquired. This information forms part of the ‘header record’ in the specifications but it is helpful to look at it separately:
Field | Size | Position | Notes |
---|---|---|---|
label | 16 | 0 | |
transducer | 80 | 16 | [1] |
physical_dim | 8 | 96 | [2] |
physical_min | 8 | 104 | |
physical_max | 8 | 112 | |
digital_min | 8 | 120 | [3] |
digital_max | 8 | 128 | [3] |
prefiltering | 80 | 136 | [4] |
number_of_samples | 8 | 216 | |
reserved | 32 | 224 | |
(total) | 256 |
Notes
-
‘transducer’ type, e.g. ‘thermistor’.
-
‘physical_dim’ (physical dimension, e.g. ‘uV’) must start with a prefix (in this example u) followed by the basic dimension (in this example V). For full details see the EDF full specifications.
-
The digital range must be somewhere between -32768 and 32767 (because data samples are 16-bit signed integers).
-
‘prefiltering’: e.g. for high-pass, low-pass and notch filters: ‘HP:0.1 Hz LP:75 Hz N:50 Hz’.
Thus, after the main header there are 256 bytes for each signal acquired. It is worth noting that each field in the signal record holds the values for all signals (rather than the header storing one full signal record, then a second full signal record, etc). That is, if e.g. two signals are acquired, then there will be two consecutive ‘label’ fields (16 + 16 bytes), then two consecutive ‘transducer’ fields (80 + 80 bytes), then two ‘physical_dim’ fields (8 + 8 bytes), etc.
Data record
Data records follow after the header record. Here, data samples (of type int16) are stored in blocks (the data record). Each block contains the samples acquired during a period of time specified in the header as ‘duration_of_data_record’, and the total number of blocks in the file are ‘number_of_data_records’. Note that EDF allows the acquisition of signals at different sampling rates; the number of samples per signal in each data block is in the signal header (‘number_of_samples_in_data_record’).
For example, two signals signal_A
and signal_B
are acquired at 100
Hz and 5 Hz respectively. The data are saved every 20 seconds (i.e.
duration_of_data_record = 20
). Thus, one block of data (a data record)
will consist of 2000 samples (number_of_samples_in_data_record = 100 Hz
times 20 seconds = 2000) from signal_A
followed by 100 samples
(number_of_samples_in_data_record = 5 Hz times 20 seconds = 100) from
signal_B
. If the header indicates that there are 70 such blocks
(number_of_data_records = 70
), then the total duration of the
recording would be 70 x 20 = 1400 seconds (number_of_data_records
x duration_of_data_record
).
Converting digital samples to physical dimensions
Data samples are stored as 16-bit (2-byte signed, little endian, two’s complement) integers. An easy way to convert those values to their physical equivalent is by using the equation for a straight line with the signal information stored in the EdfSignal record.
(Note that this conversion is done automatically by the function edfrw.headers.EdfSignal.dig_to_phys so typically it is not necessary to worry about this. The procedure is documented here for completeness.)
The slope m (or gain) of a straight line is the ratio of change in y by change in x:
m = (y1 - y0) / (x1 - x0)
and if the slope m and the intercept b are known, then the line can be described by:
y = m * (x + b)
It can be seen that the raw int16 data values stored in an EDF file correspond to x in that equation, that the physical values that we are looking for are y, and that these two are related by the parameters set in the EdfSignal record.
The slope can be calculated as:
m = (y1 - y0) / (x1 - x0)
m = (physical_max - physical_min) / (digital_max - digital_min)
and the offset (or intercept) b will be the physical_min value. From these the physical values can be obtained using the line equation:
b = offset = physical_max / m - digital_max
y = m * (x + b)
physical_value = m * (digital_value + b)
Example 1
An EDF file contains data obtained after measuring voltage with the adc from the mbed LPC1768. The native EDF data are stored as 2-byte integer digital samples. The mbed has an 12-bit adc, so its digital range is from 0 to 4095, and the reference voltage in the mbed is 3.3 V, so the physical range that the adc can measure is 0 V to 3.3 V. Thus, the header record in such EDF file would be:
physical_dim = 'V'
physical_min = 0
physical_max = 3.3
digital_min = 0
digital_max = 4095
These parameters are used to calculate the gain m (slope):
m = (y1 - y0) / (x1 - x0)
m = (physical_max - physical_min) / (digital_max - digital_min)
m = (3.3 - 0) / (4095 - 0)
m = 0.0008
b = physical_max / m - digital_max
b = 3.3 / 0.0008 - 4095
b = 0.5
and with that the physical values (voltage):
physical_value = m * (digital_value + b)
physical_value = 0.0008 * (digital_value + 0.5)
digital value of 2048 will represent 0.0008 * (2048 + 0) = 1.65 volts, as expected.
Example 2
EEG data are acquired using a commercial system. The manufacturer explains in the documentation that the analog outputs from their hardware are signals that range between 0 and 5 volts, and are centred at 2.048 V, so:
physical_dim = 'V'
physical_min = 0 - 2.048 = -2.048
physical_max = 5 - 2.048 = 2.952
If these signals were acquired with a 14-bit ADC, then:
digital_min = 0
digital_max = 2**14 - 1 = 16383
and thus:
m = (physical_max - physical_min) / (digital_max - digital_min)
m = (2.952 + 2.048) / (16383 - 0)
m = 5 / 16383 = 0.00031
b = offset = physical_max / m - digital_max = -23093.4768
y = m * x + b
physical_value = 0.00031 * (digital_value + -23093.4768)