dcase_models.data.AugmentedDataset¶
-
class
dcase_models.data.AugmentedDataset(dataset, sr, augmentations_list)[source]¶ Bases:
dcase_models.data.dataset_base.DatasetClass that manage data augmentation.
Basically, it takes an instance of Dataset and generates an augmented one. Includes methods to generate data augmented versions of the audio files in an existing Dataset.
Parameters: - dataset : Dataset
Instance of Dataset to be augmented.
- augmentations_list : list
List of augmentation types and their parameters. Dict of form: [{‘type’ : aug_type, ‘param1’: param1 …} …]. e.g.:
[ {'type': 'pitch_shift', 'n_semitones': -1}, {'type': 'time_stretching', 'factor': 1.05} ]
- sr : int
Sampling rate
Examples
Define an instance of UrbanSound8k and convert it into an augmented instance of the dataset. Note that the actual augmentation is performed when process() method is called.
>>> from dcase_models.data.datasets import UrbanSound8k >>> from dcase_models.data.data_augmentation import AugmentedDataset >>> dataset = UrbanSound8k('../datasets/UrbanSound8K') >>> augmentations = [ {"type": "pitch_shift", "n_semitones": -1}, {"type": "time_stretching", "factor": 1.05}, {"type": "white_noise", "snr": 60} ] >>> aug_dataset = AugmentedDataset(dataset, augmentations) >>> aug_dataset.process()
-
__init__(dataset, sr, augmentations_list)[source]¶ Initialize the AugmentedDataset.
Initialize sox Transformers for each type of augmentation.
Methods
__init__(dataset, sr, augmentations_list)Initialize the AugmentedDataset. build()Builds the dataset. change_sampling_rate(new_sr)Changes the sampling rate of each wav file in audio_path. check_if_downloaded()Checks if the dataset was downloaded. check_sampling_rate(sr)Checks if dataset was resampled before. convert_to_wav([remove_original])Converts each file in the dataset to wav format. download(zenodo_url, zenodo_files[, …])Downloads and decompresses the dataset from zenodo. generate_file_lists()Create self.file_lists, a dict that includes a list of files per fold. get_annotations(file_path, features)Returns the annotations of the file in file_path. get_audio_paths([sr])Returns a list of paths to the folders that include the dataset augmented files. process()Generate augmentated data for each file in dataset. set_as_downloaded()Saves a download.txt file in dataset_path as a downloaded flag. -
build()¶ Builds the dataset.
Define specific attributes of the dataset. It’s mandatory to define audio_path, fold_list and label_list. Other attributes may be defined here (url, authors, etc.).
-
change_sampling_rate(new_sr)¶ Changes the sampling rate of each wav file in audio_path.
Creates a new folder named audio_path{new_sr} (i.e audio22050) and converts each wav file in audio_path and save the result in the new folder.
Parameters: - sr : int
Sampling rate.
-
check_if_downloaded()¶ Checks if the dataset was downloaded.
Just checks if exists download.txt file.
Further checks in the future.
-
check_sampling_rate(sr)¶ Checks if dataset was resampled before.
For now, only checks if the folder {audio_path}{sr} exists and each wav file present in audio_path is also present in {audio_path}{sr}.
Parameters: - sr : int
Sampling rate.
Returns: - bool
True if the dataset was resampled before.
-
convert_to_wav(remove_original=False)¶ Converts each file in the dataset to wav format.
If remove_original is False, the original files will be deleted
Parameters: - remove_original : bool
Remove original files.
-
download(zenodo_url, zenodo_files, force_download=False)¶ Downloads and decompresses the dataset from zenodo.
Parameters: - zenodo_url : str
URL with the zenodo files. e.g. ‘https://zenodo.org/record/12345/files’
- zenodo_files : list of str
List of files. e.g. [‘file1.tar.gz’, ‘file2.tar.gz’, ‘file3.tar.gz’]
- force_download : bool
If True, download the dataset even if was downloaded before.
Returns: - bool
True if the downloading process was successful.
-
generate_file_lists()[source]¶ Create self.file_lists, a dict that includes a list of files per fold.
Just call dataset.generate_file_lists() and copy the attribute.
-
get_annotations(file_path, features)[source]¶ Returns the annotations of the file in file_path.
Parameters: - file_path : str
Path to the file
- features : ndarray
nD array with the features of file_path
- time_resolution : float
Time resolution of the features
Returns: - ndarray
Annotations of the file file_path Expected output shape: (features.shape[0], len(self.label_list))
-
get_audio_paths(sr=None)[source]¶ Returns a list of paths to the folders that include the dataset augmented files.
The folder of each augmentation is defined using its name and parameter values.
e.g. {DATASET_PATH}/audio/pitch_shift_1 where 1 is the ‘n_semitones’ parameter.
Parameters: - sr : int or None, optional
Sampling rate (optional). We keep this parameter to keep compatibility with Dataset.get_audio_paths() method.
Returns: - audio_path : str
Path to the root audio folder. e.g. DATASET_PATH/audio
- subfolders : list of str
List of subfolders include in audio folder. e.g.:
[ '{DATASET_PATH}/audio/original', '{DATASET_PATH}/audio/pitch_shift_1', '{DATASET_PATH}/audio/time_stretching_1.1', ]
-
process()[source]¶ Generate augmentated data for each file in dataset.
Replicate the folder structure of {DATASET_PATH}/audio/original into the folder of each augmentation folder.
-
set_as_downloaded()¶ Saves a download.txt file in dataset_path as a downloaded flag.