dcase_models.data.MelSpectrogram

class dcase_models.data.MelSpectrogram(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, n_fft=1024, mel_bands=64, pad_mode='reflect', **kwargs)[source]

Bases: dcase_models.data.feature_extractor.FeatureExtractor

MelSpectrogram feature extractor.

Extracts the log-scaled mel-spectrogram of the audio signals. The mel-spectrogram is calculated over the whole audio signal and then is separated in overlapped sequences (frames).

Parameters:
n_fft : int, default=1024

Number of samples used for FFT calculation. Refer to librosa.core.stft for further information.

mel_bands : int, default=64

Number of mel bands.

pad_mode : str or None, default=’reflect’

Mode of padding applied to the audio signal. This argument is passed to librosa.util.fix_length for padding the signal. If pad_mode is None, no padding is applied.

kwargs

Additional keyword arguments to librosa.filters.mel.

See also

FeatureExtractor
FeatureExtractor base class
Spectrogram
Spectrogram features

Notes

Based in librosa.core.stft and librosa.filters.mel functions.

Examples

Extract features of a given file.

>>> from dcase_models.data.features import MelSpectrogram
>>> from dcase_models.util.files import example_audio_file
>>> features = MelSpectrogram()
>>> features_shape = features.get_shape()
>>> print(features_shape)
    (21, 32, 64)
>>> file_name = example_audio_file()
>>> mel_spectrogram = features.calculate(file_name)
>>> print(mel_spectrogram.shape)
    (3, 32, 64)

Extract features for each file in a given dataset.

>>> from dcase_models.data.datasets import ESC50
>>> dataset = ESC50('../datasets/ESC50')
>>> features.extract(dataset)
__init__(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, n_fft=1024, mel_bands=64, pad_mode='reflect', **kwargs)[source]

Initialize the FeatureExtractor

Methods

__init__([sequence_time, sequence_hop_time, …]) Initialize the FeatureExtractor
calculate(file_name) Loads an audio file and calculates features
check_if_extracted(dataset) Checks if the features of each file in dataset was calculated.
check_if_extracted_path(path) Checks if the features saved in path were calculated.
extract(dataset) Extracts features for each file in dataset.
get_features_path(dataset) Returns the path to the features folder.
get_shape([length_sec]) Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.
load_audio(file_name[, mono, …]) Loads an audio signal and converts it to mono if needed
set_as_extracted(path) Saves a json file with self.__dict__.
calculate(file_name)[source]

Loads an audio file and calculates features

Parameters:
file_name : str

Path to the audio file

Returns:
ndarray

feature representation of the audio signal

check_if_extracted(dataset)

Checks if the features of each file in dataset was calculated.

Calls check_if_extracted_path for each path in the dataset.

Parameters:
path : str

Path to the features folder

Returns:
bool

True if the features were already extracted.

check_if_extracted_path(path)

Checks if the features saved in path were calculated.

Compare if the features were calculated with the same parameters of self.__dict__.

Parameters:
path : str

Path to the features folder

Returns:
bool

True if the features were already extracted.

extract(dataset)

Extracts features for each file in dataset.

Call calculate() for each file in dataset and save the result into the features path.

Parameters:
dataset : Dataset

Instance of the dataset.

get_features_path(dataset)

Returns the path to the features folder.

Parameters:
dataset : Dataset

Instance of the dataset.

Returns:
features_path : str

Path to the features folder.

get_shape(length_sec=10.0)

Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.

Parameters:
length_sec : float

Duration in seconds of the test signal

Returns:
tuple

Shape of the feature representation

load_audio(file_name, mono=True, change_sampling_rate=True)

Loads an audio signal and converts it to mono if needed

Parameters:
file_name : str

Path to the audio file

mono : bool

if True, only returns left channel

change_sampling_rate : bool

if True, the audio signal is re-sampled to self.sr

Returns:
array

audio signal

set_as_extracted(path)

Saves a json file with self.__dict__.

Useful for checking if the features files were calculated with same parameters.

Parameters:
path : str

Path to the JSON file