dcase_models.data.MelSpectrogram¶

class dcase_models.data.MelSpectrogram(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, n_fft=1024, mel_bands=64, pad_mode='reflect', **kwargs)[source]¶

Bases: dcase_models.data.feature_extractor.FeatureExtractor

MelSpectrogram feature extractor.

Extracts the log-scaled mel-spectrogram of the audio signals. The mel-spectrogram is calculated over the whole audio signal and then is separated in overlapped sequences (frames).

Parameters:

n_fft : int, default=1024: Number of samples used for FFT calculation. Refer to librosa.core.stft for further information.
mel_bands : int, default=64: Number of mel bands.
pad_mode : str or None, default=’reflect’: Mode of padding applied to the audio signal. This argument is passed to librosa.util.fix_length for padding the signal. If pad_mode is None, no padding is applied.
kwargs: Additional keyword arguments to librosa.filters.mel.

See also

FeatureExtractor: FeatureExtractor base class
Spectrogram: Spectrogram features

Notes

Based in librosa.core.stft and librosa.filters.mel functions.

Examples

Extract features of a given file.

>>> from dcase_models.data.features import MelSpectrogram
>>> from dcase_models.util.files import example_audio_file
>>> features = MelSpectrogram()
>>> features_shape = features.get_shape()
>>> print(features_shape)
    (21, 32, 64)
>>> file_name = example_audio_file()
>>> mel_spectrogram = features.calculate(file_name)
>>> print(mel_spectrogram.shape)
    (3, 32, 64)

Extract features for each file in a given dataset.

>>> from dcase_models.data.datasets import ESC50
>>> dataset = ESC50('../datasets/ESC50')
>>> features.extract(dataset)

__init__(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, n_fft=1024, mel_bands=64, pad_mode='reflect', **kwargs)[source]¶: Initialize the FeatureExtractor

Methods

`__init__`([sequence_time, sequence_hop_time, …])	Initialize the FeatureExtractor
`calculate`(file_name)	Loads an audio file and calculates features
`check_if_extracted`(dataset)	Checks if the features of each file in dataset was calculated.
`check_if_extracted_path`(path)	Checks if the features saved in path were calculated.
`extract`(dataset)	Extracts features for each file in dataset.
`get_features_path`(dataset)	Returns the path to the features folder.
`get_shape`([length_sec])	Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.
`load_audio`(file_name[, mono, …])	Loads an audio signal and converts it to mono if needed
`set_as_extracted`(path)	Saves a json file with self.__dict__.

calculate(file_name)[source]¶

Loads an audio file and calculates features

Parameters:	file_name : str Path to the audio file
Returns:	ndarray feature representation of the audio signal

check_if_extracted(dataset)¶

Checks if the features of each file in dataset was calculated.

Calls check_if_extracted_path for each path in the dataset.

Parameters:	path : str Path to the features folder
Returns:	bool True if the features were already extracted.

check_if_extracted_path(path)¶

Checks if the features saved in path were calculated.

Compare if the features were calculated with the same parameters of self.__dict__.

Parameters:	path : str Path to the features folder
Returns:	bool True if the features were already extracted.

extract(dataset)¶

Extracts features for each file in dataset.

Call calculate() for each file in dataset and save the result into the features path.

Parameters:	dataset : Dataset Instance of the dataset.

get_features_path(dataset)¶

Returns the path to the features folder.

Parameters:	dataset : Dataset Instance of the dataset.
Returns:	features_path : str Path to the features folder.

get_shape(length_sec=10.0)¶

Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.

Parameters:	length_sec : float Duration in seconds of the test signal
Returns:	tuple Shape of the feature representation

load_audio(file_name, mono=True, change_sampling_rate=True)¶

Loads an audio signal and converts it to mono if needed

Parameters:	file_name : str Path to the audio file mono : bool if True, only returns left channel change_sampling_rate : bool if True, the audio signal is re-sampled to self.sr
Returns:	array audio signal

set_as_extracted(path)¶

Saves a json file with self.__dict__.

Useful for checking if the features files were calculated with same parameters.

Parameters:	path : str Path to the JSON file