dcase_models.data.Spectrogram¶

class dcase_models.data.Spectrogram(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, n_fft=1024, pad_mode='reflect')[source]¶

Bases: dcase_models.data.feature_extractor.FeatureExtractor

Spectrogram feature extractor.

Extracts the log-scaled spectrogram of the audio signals. The spectrogram is calculated over the whole audio signal and then is separated in overlapped sequences (frames)

Parameters:	n_fft : int, default=1024 Number of samples used for FFT calculation. Refer to librosa.core.stft for further information. pad_mode : str or None, default=’reflect’ Mode of padding applied to the audio signal. This argument is passed to librosa.util.fix_length for padding the signal. If pad_mode is None, no padding is applied.

See also

FeatureExtractor: FeatureExtractor base class.
MelSpectrogram: MelSpectrogram feature extractor.

Notes

Based in librosa.core.stft function.

Examples

Extract features of a given file

>>> from dcase_models.data.features import Spectrogram
>>> from dcase_models.util.files import example_audio_file
>>> features = Spectrogram()
>>> features_shape = features.get_shape()
>>> print(features_shape)
    (21, 32, 513)
>>> file_name = example_audio_file()
>>> spectrogram = features.calculate(file_name)
>>> print(spectrogram.shape)
    (3, 32, 513)

Extract features for each file in a given dataset.

>>> from dcase_models.data.datasets import ESC50
>>> dataset = ESC50('../datasets/ESC50')
>>> features.extract(dataset)

__init__(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, n_fft=1024, pad_mode='reflect')[source]¶: Initialize the FeatureExtractor

Methods

`__init__`([sequence_time, sequence_hop_time, …])	Initialize the FeatureExtractor
`calculate`(file_name)	Loads an audio file and calculates features
`check_if_extracted`(dataset)	Checks if the features of each file in dataset was calculated.
`check_if_extracted_path`(path)	Checks if the features saved in path were calculated.
`convert_to_sequences`(audio_representation)
`extract`(dataset)	Extracts features for each file in dataset.
`get_features_path`(dataset)	Returns the path to the features folder.
`get_shape`([length_sec])	Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.
`load_audio`(file_name[, mono, …])	Loads an audio signal and converts it to mono if needed
`pad_audio`(audio)
`set_as_extracted`(path)	Saves a json file with self.__dict__.

calculate(file_name)[source]¶

Loads an audio file and calculates features

Parameters:	file_name : str Path to the audio file
Returns:	ndarray feature representation of the audio signal

check_if_extracted(dataset)¶

Checks if the features of each file in dataset was calculated.

Calls check_if_extracted_path for each path in the dataset.

Parameters:	path : str Path to the features folder
Returns:	bool True if the features were already extracted.

check_if_extracted_path(path)¶

Checks if the features saved in path were calculated.

Compare if the features were calculated with the same parameters of self.__dict__.

Parameters:	path : str Path to the features folder
Returns:	bool True if the features were already extracted.

convert_to_sequences(audio_representation)¶

extract(dataset)¶

Extracts features for each file in dataset.

Call calculate() for each file in dataset and save the result into the features path.

Parameters:	dataset : Dataset Instance of the dataset.

get_features_path(dataset)¶

Returns the path to the features folder.

Parameters:	dataset : Dataset Instance of the dataset.
Returns:	features_path : str Path to the features folder.

get_shape(length_sec=10.0)¶

Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.

Parameters:	length_sec : float Duration in seconds of the test signal
Returns:	tuple Shape of the feature representation

load_audio(file_name, mono=True, change_sampling_rate=True)¶

Loads an audio signal and converts it to mono if needed

Parameters:	file_name : str Path to the audio file mono : bool if True, only returns left channel change_sampling_rate : bool if True, the audio signal is re-sampled to self.sr
Returns:	array audio signal

pad_audio(audio)¶

set_as_extracted(path)¶

Saves a json file with self.__dict__.

Useful for checking if the features files were calculated with same parameters.

Parameters:	path : str Path to the JSON file