dcase_models.data.AugmentedDataset¶

class dcase_models.data.AugmentedDataset(dataset, sr, augmentations_list)[source]¶

Bases: dcase_models.data.dataset_base.Dataset

Class that manage data augmentation.

Basically, it takes an instance of Dataset and generates an augmented one. Includes methods to generate data augmented versions of the audio files in an existing Dataset.

Parameters:	dataset : Dataset Instance of Dataset to be augmented. augmentations_list : list List of augmentation types and their parameters. Dict of form: [{‘type’ : aug_type, ‘param1’: param1 …} …]. e.g.: [ {'type': 'pitch_shift', 'n_semitones': -1}, {'type': 'time_stretching', 'factor': 1.05} ] sr : int Sampling rate

Examples

Define an instance of UrbanSound8k and convert it into an augmented instance of the dataset. Note that the actual augmentation is performed when process() method is called.

>>> from dcase_models.data.datasets import UrbanSound8k
>>> from dcase_models.data.data_augmentation import AugmentedDataset
>>> dataset = UrbanSound8k('../datasets/UrbanSound8K')
>>> augmentations = [
        {"type": "pitch_shift", "n_semitones": -1},
        {"type": "time_stretching", "factor": 1.05},
        {"type": "white_noise", "snr": 60}
    ]
>>> aug_dataset = AugmentedDataset(dataset, augmentations)
>>> aug_dataset.process()

__init__(dataset, sr, augmentations_list)[source]¶

Initialize the AugmentedDataset.

Initialize sox Transformers for each type of augmentation.

Methods

`__init__`(dataset, sr, augmentations_list)	Initialize the AugmentedDataset.
`build`()	Builds the dataset.
`change_sampling_rate`(new_sr)	Changes the sampling rate of each wav file in audio_path.
`check_if_downloaded`()	Checks if the dataset was downloaded.
`check_sampling_rate`(sr)	Checks if dataset was resampled before.
`convert_to_wav`([remove_original])	Converts each file in the dataset to wav format.
`download`(zenodo_url, zenodo_files[, …])	Downloads and decompresses the dataset from zenodo.
`generate_file_lists`()	Create self.file_lists, a dict that includes a list of files per fold.
`get_annotations`(file_path, features)	Returns the annotations of the file in file_path.
`get_audio_paths`([sr])	Returns a list of paths to the folders that include the dataset augmented files.
`process`()	Generate augmentated data for each file in dataset.
`set_as_downloaded`()	Saves a download.txt file in dataset_path as a downloaded flag.

build()¶

Builds the dataset.

Define specific attributes of the dataset. It’s mandatory to define audio_path, fold_list and label_list. Other attributes may be defined here (url, authors, etc.).

change_sampling_rate(new_sr)¶

Changes the sampling rate of each wav file in audio_path.

Creates a new folder named audio_path{new_sr} (i.e audio22050) and converts each wav file in audio_path and save the result in the new folder.

Parameters:	sr : int Sampling rate.

check_if_downloaded()¶

Checks if the dataset was downloaded.

Just checks if exists download.txt file.

Further checks in the future.

check_sampling_rate(sr)¶

Checks if dataset was resampled before.

For now, only checks if the folder {audio_path}{sr} exists and each wav file present in audio_path is also present in {audio_path}{sr}.

Parameters:	sr : int Sampling rate.
Returns:	bool True if the dataset was resampled before.

convert_to_wav(remove_original=False)¶

Converts each file in the dataset to wav format.

If remove_original is False, the original files will be deleted

Parameters:	remove_original : bool Remove original files.

download(zenodo_url, zenodo_files, force_download=False)¶

Downloads and decompresses the dataset from zenodo.

Parameters:	zenodo_url : str URL with the zenodo files. e.g. ‘https://zenodo.org/record/12345/files’ zenodo_files : list of str List of files. e.g. [‘file1.tar.gz’, ‘file2.tar.gz’, ‘file3.tar.gz’] force_download : bool If True, download the dataset even if was downloaded before.
Returns:	bool True if the downloading process was successful.

generate_file_lists()[source]¶

Create self.file_lists, a dict that includes a list of files per fold.

Just call dataset.generate_file_lists() and copy the attribute.

get_annotations(file_path, features)[source]¶

Returns the annotations of the file in file_path.

Parameters:	file_path : str Path to the file features : ndarray nD array with the features of file_path time_resolution : float Time resolution of the features
Returns:	ndarray Annotations of the file file_path Expected output shape: (features.shape[0], len(self.label_list))

get_audio_paths(sr=None)[source]¶

Returns a list of paths to the folders that include the dataset augmented files.

The folder of each augmentation is defined using its name and parameter values.

e.g. {DATASET_PATH}/audio/pitch_shift_1 where 1 is the ‘n_semitones’ parameter.

Parameters:	sr : int or None, optional Sampling rate (optional). We keep this parameter to keep compatibility with Dataset.get_audio_paths() method.
Returns:	audio_path : str Path to the root audio folder. e.g. DATASET_PATH/audio subfolders : list of str List of subfolders include in audio folder. e.g.: [ '{DATASET_PATH}/audio/original', '{DATASET_PATH}/audio/pitch_shift_1', '{DATASET_PATH}/audio/time_stretching_1.1', ]

process()[source]¶

Generate augmentated data for each file in dataset.

Replicate the folder structure of {DATASET_PATH}/audio/original into the folder of each augmentation folder.

set_as_downloaded()¶: Saves a download.txt file in dataset_path as a downloaded flag.