dcase_models.data.DataGenerator

class dcase_models.data.DataGenerator(dataset, inputs, folds, outputs='annotations', batch_size=32, shuffle=True, train=True, scaler=None, scaler_outputs=None)[source]

Bases: object

Includes methods to load features files from DCASE datasets.

Parameters:
dataset : Dataset

Instance of the Dataset used to load the data. Note that the dataset has to be downloaded before initializing the DataGenerator. Refer to dcase-models/data/datasets.py for a complete list of available datasets.

inputs : instance of FeatureExtractor or list of FeatureExtractor instances

Instance(s) of FeatureExtractor. This are the feature extractor(s) used to generate the features. For multi-input, pass a list of FeatureExtractor instances.

folds : list of str

List of folds to be loaded. Each fold has to be in dataset.fold_list. Note that since the folds used at each stage of the pipeline (training, validation, evaluation) are different, an instance of DataGenerator for each stage has to be created. e.g. [‘fold1’, ‘fold2’, ‘fold3’, …]

outputs : str, FeatureExtractor or list, default=’annotations’

Instance(s) of FeatureExtractor used to generate the outputs. To use the annotations obtained from Dataset, use a string. For multi-output, use a list of FeatureExtractor and/or strings.

batch_size : int, default=32

Number of files loaded when call get_data_batch(). Note that the meaning of batch_size here is slightly different from the one in machine learning libraries like keras. In these libraries batch_size means the number of instances (sequences in DCASE-models) used in each training step. Here batch_size is the number of files, and therefore, the number of sequences varies in each batch.

shuffle: bool, default=True

When training a model, it is typical to shuffle the dataset at the end of each epoch. If shuffle is True (default), then the audio file list is shuffled when the class is initialized and when shuffle_list() method is called.

train : bool, default True

When training, it is typical to feed the model with a numpy array that contains all the data concatenated. For validation and testing it is necessary to have the features of each file separate in order to do a file-wise evaluation. Therefore, if train is True, the loaded data is concatenated and converted to a numpy array. If train is False get_data() and get_data_batch() return a list, whose elements are the features of each file in the audio_file_list.

scaler : Scaler or None, default=None

If is not None, the Scaler object is used to scale the data after loading.

scaler_outputs : Scaler or None, default=None

Same as scaler but for the system outputs.

See also

Dataset
Dataset class
FeatureExtractor
FeatureExtractor class

Examples

Create instances of Dataset and FeatureExtractor with default parameters

>>> from dcase_models.data.datasets import UrbanSound8k
>>> from dcase_models.data.features import MelSpectrogram
>>> from dcase_models.data.data_generator import DataGenerator
>>> dataset = UrbanSound8k('../datasets/UrbanSound8k')
>>> features = MelSpectrogram()

Assuming that the dataset was downloaded and features were extracted already, we can initialize the data generators. This example uses fold1 and fold2 for training and fold3 for validation.

>>> data_gen_train = DataGenerator(
    dataset, features, ['fold1', 'fold2'], train=True)
>>> data_gen_val = DataGenerator(
    dataset, features, ['fold3'], train=False)
>>> X_train, Y_train = data_gen_train.get_data_batch(0)
>>> print(X_train.shape, Y_train.shape)
    (212, 43, 64) (212, 10)
>>> X_val, Y_val = data_gen_val.get_data_batch(0)
>>> print(len(X_val), len(Y_val))
    32 32
>>> print(X_val[0].shape, Y_val[0].shape)
    (7, 43, 64) (7, 10)
>>> X_train, Y_train = data_gen_train.get_data()
>>> print(X_train.shape, Y_train.shape)
    (11095, 43, 64) (11095, 10)
>>> X_val, Y_val = data_gen_val.get_data()
>>> print(len(X_val), len(Y_val))
    925 925
>>> print(X_val[0].shape, Y_val[0].shape)
    (7, 43, 64) (7, 10)
Attributes:
audio_file_list : list of dict

List of audio files from which the features will be loaded. Each element in the list includes information of the original audio file (important to get the annotations) and the subfolder where is the resampled (and maybe augmented) audio file. e.g.:

audio_file_list = [

{‘file_original’: ‘audio/1.wav’, ‘sub_folder’: ‘original’}, {‘file_original’: ‘audio/1.wav’, ‘sub_folder’: ‘pitch_shift_1’}, {‘file_original’: ‘audio/2.wav’, ‘sub_folder’: ‘original’}, …

]

__init__(dataset, inputs, folds, outputs='annotations', batch_size=32, shuffle=True, train=True, scaler=None, scaler_outputs=None)[source]

Initialize the DataGenerator.

Generates the audio_file_list by concatenating all the files from the folds passed as an argument.

Methods

__init__(dataset, inputs, folds[, outputs, …]) Initialize the DataGenerator.
convert_audio_path_to_features_path(…[, …]) Converts audio path(s) to features path(s).
convert_features_path_to_audio_path(…[, sr]) Converts features path(s) to audio path(s).
get_data() Return all data from the selected folds.
get_data_batch(index) Return the data from the batch given by argument.
get_data_from_file(file_index) Returns the data from the file index given by argument.
paths_remove_aug_subfolder(path) Removes the subfolder string related to augmentation from a path.
set_scaler(scaler) Set scaler object.
set_scaler_outputs(scaler_outputs) Set scaler object.
shuffle_list() Shuffles features_file_list.
convert_audio_path_to_features_path(audio_file, features_path, subfolder='')[source]

Converts audio path(s) to features path(s).

Parameters:
audio_file : str or list of str

Path(s) to the audio file(s).

Returns:
features_file : str or list of str

Path(s) to the features file(s).

convert_features_path_to_audio_path(features_file, features_path, sr=None)[source]

Converts features path(s) to audio path(s).

Parameters:
features_file : str or list of str

Path(s) to the features file(s).

Returns:
audio_file : str or list of str

Path(s) to the audio file(s).

get_data()[source]

Return all data from the selected folds.

If train were set as True, the output is concatenated and converted to a numpy array. Otherwise the outputs are lists whose elements are the features of each file.

Returns:
X : list or ndarray

List or array of features for each file.

Y : list or ndarray

List or array of annotations for each file.

get_data_batch(index)[source]

Return the data from the batch given by argument.

If train were set as True, the output is concatenated and converted to a numpy array. Otherwise the outputs are lists whose elements are the features of each file.

Returns:
X : list or ndarray

List or array of features for each file.

Y : list or ndarray

List or array of annotations for each file.

get_data_from_file(file_index)[source]

Returns the data from the file index given by argument.

Returns:
X : ndarray

Array of features for each file.

Y : ndarray

Array of annotations for each file.

paths_remove_aug_subfolder(path)[source]

Removes the subfolder string related to augmentation from a path.

Converts DATASET_PATH/audio/original/… into DATASET_PATH/audio/…

Parameters:
path : str or list of str

Path to be converted.

Returns:
features_file : str or list of str

Path(s) to the features file(s).

set_scaler(scaler)[source]

Set scaler object.

set_scaler_outputs(scaler_outputs)[source]

Set scaler object.

shuffle_list()[source]

Shuffles features_file_list.

Notes

Only shuffle the list if shuffle is True.