dcase_models.data.AugmentedDataset

class dcase_models.data.AugmentedDataset(dataset, sr, augmentations_list)[source]

Bases: dcase_models.data.dataset_base.Dataset

Class that manage data augmentation.

Basically, it takes an instance of Dataset and generates an augmented one. Includes methods to generate data augmented versions of the audio files in an existing Dataset.

Parameters:
dataset : Dataset

Instance of Dataset to be augmented.

augmentations_list : list

List of augmentation types and their parameters. Dict of form: [{‘type’ : aug_type, ‘param1’: param1 …} …]. e.g.:

[
    {'type': 'pitch_shift', 'n_semitones': -1},
    {'type': 'time_stretching', 'factor': 1.05}
]
sr : int

Sampling rate

Examples

Define an instance of UrbanSound8k and convert it into an augmented instance of the dataset. Note that the actual augmentation is performed when process() method is called.

>>> from dcase_models.data.datasets import UrbanSound8k
>>> from dcase_models.data.data_augmentation import AugmentedDataset
>>> dataset = UrbanSound8k('../datasets/UrbanSound8K')
>>> augmentations = [
        {"type": "pitch_shift", "n_semitones": -1},
        {"type": "time_stretching", "factor": 1.05},
        {"type": "white_noise", "snr": 60}
    ]
>>> aug_dataset = AugmentedDataset(dataset, augmentations)
>>> aug_dataset.process()
__init__(dataset, sr, augmentations_list)[source]

Initialize the AugmentedDataset.

Initialize sox Transformers for each type of augmentation.

Methods

__init__(dataset, sr, augmentations_list) Initialize the AugmentedDataset.
build() Builds the dataset.
change_sampling_rate(new_sr) Changes the sampling rate of each wav file in audio_path.
check_if_downloaded() Checks if the dataset was downloaded.
check_sampling_rate(sr) Checks if dataset was resampled before.
convert_to_wav([remove_original]) Converts each file in the dataset to wav format.
download(zenodo_url, zenodo_files[, …]) Downloads and decompresses the dataset from zenodo.
generate_file_lists() Create self.file_lists, a dict that includes a list of files per fold.
get_annotations(file_path, features, …) Returns the annotations of the file in file_path.
get_audio_paths([sr]) Returns a list of paths to the folders that include the dataset augmented files.
process() Generate augmentated data for each file in dataset.
set_as_downloaded() Saves a download.txt file in dataset_path as a downloaded flag.
build()

Builds the dataset.

Define specific attributes of the dataset. It’s mandatory to define audio_path, fold_list and label_list. Other attributes may be defined here (url, authors, etc.).

change_sampling_rate(new_sr)

Changes the sampling rate of each wav file in audio_path.

Creates a new folder named audio_path{new_sr} (i.e audio22050) and converts each wav file in audio_path and save the result in the new folder.

Parameters:
sr : int

Sampling rate.

check_if_downloaded()

Checks if the dataset was downloaded.

Just checks if exists download.txt file.

Further checks in the future.

check_sampling_rate(sr)

Checks if dataset was resampled before.

For now, only checks if the folder {audio_path}{sr} exists and each wav file present in audio_path is also present in {audio_path}{sr}.

Parameters:
sr : int

Sampling rate.

Returns:
bool

True if the dataset was resampled before.

convert_to_wav(remove_original=False)

Converts each file in the dataset to wav format.

If remove_original is False, the original files will be deleted

Parameters:
remove_original : bool

Remove original files.

download(zenodo_url, zenodo_files, force_download=False)

Downloads and decompresses the dataset from zenodo.

Parameters:
zenodo_url : str

URL with the zenodo files. e.g. ‘https://zenodo.org/record/12345/files

zenodo_files : list of str

List of files. e.g. [‘file1.tar.gz’, ‘file2.tar.gz’, ‘file3.tar.gz’]

force_download : bool

If True, download the dataset even if was downloaded before.

Returns:
bool

True if the downloading process was successful.

generate_file_lists()[source]

Create self.file_lists, a dict that includes a list of files per fold.

Just call dataset.generate_file_lists() and copy the attribute.

get_annotations(file_path, features, time_resolution)[source]

Returns the annotations of the file in file_path.

Parameters:
file_path : str

Path to the file

features : ndarray

nD array with the features of file_path

time_resolution : float

Time resolution of the features

Returns:
ndarray

Annotations of the file file_path Expected output shape: (features.shape[0], len(self.label_list))

get_audio_paths(sr=None)[source]

Returns a list of paths to the folders that include the dataset augmented files.

The folder of each augmentation is defined using its name and parameter values.

e.g. {DATASET_PATH}/audio/pitch_shift_1 where 1 is the ‘n_semitones’ parameter.

Parameters:
sr : int or None, optional

Sampling rate (optional). We keep this parameter to keep compatibility with Dataset.get_audio_paths() method.

Returns:
audio_path : str

Path to the root audio folder. e.g. DATASET_PATH/audio

subfolders : list of str

List of subfolders include in audio folder. e.g.:

[
    '{DATASET_PATH}/audio/original',
    '{DATASET_PATH}/audio/pitch_shift_1',
    '{DATASET_PATH}/audio/time_stretching_1.1',
]
process()[source]

Generate augmentated data for each file in dataset.

Replicate the folder structure of {DATASET_PATH}/audio/original into the folder of each augmentation folder.

set_as_downloaded()

Saves a download.txt file in dataset_path as a downloaded flag.