dcase_models.data.Spectrogram¶
-
class
dcase_models.data.Spectrogram(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, n_fft=1024, pad_mode='reflect')[source]¶ Bases:
dcase_models.data.feature_extractor.FeatureExtractorSpectrogram feature extractor.
Extracts the log-scaled spectrogram of the audio signals. The spectrogram is calculated over the whole audio signal and then is separated in overlapped sequences (frames)
Parameters: - n_fft : int, default=1024
Number of samples used for FFT calculation. Refer to librosa.core.stft for further information.
- pad_mode : str or None, default=’reflect’
Mode of padding applied to the audio signal. This argument is passed to librosa.util.fix_length for padding the signal. If pad_mode is None, no padding is applied.
See also
FeatureExtractor- FeatureExtractor base class.
MelSpectrogram- MelSpectrogram feature extractor.
Notes
Based in librosa.core.stft function.
Examples
Extract features of a given file
>>> from dcase_models.data.features import Spectrogram >>> from dcase_models.util.files import example_audio_file >>> features = Spectrogram() >>> features_shape = features.get_shape() >>> print(features_shape) (21, 32, 513) >>> file_name = example_audio_file() >>> spectrogram = features.calculate(file_name) >>> print(spectrogram.shape) (3, 32, 513)
Extract features for each file in a given dataset.
>>> from dcase_models.data.datasets import ESC50 >>> dataset = ESC50('../datasets/ESC50') >>> features.extract(dataset)
-
__init__(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, n_fft=1024, pad_mode='reflect')[source]¶ Initialize the FeatureExtractor
Methods
__init__([sequence_time, sequence_hop_time, …])Initialize the FeatureExtractor calculate(file_name)Loads an audio file and calculates features check_if_extracted(dataset)Checks if the features of each file in dataset was calculated. check_if_extracted_path(path)Checks if the features saved in path were calculated. extract(dataset)Extracts features for each file in dataset. get_features_path(dataset)Returns the path to the features folder. get_shape([length_sec])Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation. load_audio(file_name[, mono, …])Loads an audio signal and converts it to mono if needed set_as_extracted(path)Saves a json file with self.__dict__. -
calculate(file_name)[source]¶ Loads an audio file and calculates features
Parameters: - file_name : str
Path to the audio file
Returns: - ndarray
feature representation of the audio signal
-
check_if_extracted(dataset)¶ Checks if the features of each file in dataset was calculated.
Calls check_if_extracted_path for each path in the dataset.
Parameters: - path : str
Path to the features folder
Returns: - bool
True if the features were already extracted.
-
check_if_extracted_path(path)¶ Checks if the features saved in path were calculated.
Compare if the features were calculated with the same parameters of self.__dict__.
Parameters: - path : str
Path to the features folder
Returns: - bool
True if the features were already extracted.
-
extract(dataset)¶ Extracts features for each file in dataset.
Call calculate() for each file in dataset and save the result into the features path.
Parameters: - dataset : Dataset
Instance of the dataset.
-
get_features_path(dataset)¶ Returns the path to the features folder.
Parameters: - dataset : Dataset
Instance of the dataset.
Returns: - features_path : str
Path to the features folder.
-
get_shape(length_sec=10.0)¶ Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.
Parameters: - length_sec : float
Duration in seconds of the test signal
Returns: - tuple
Shape of the feature representation
-
load_audio(file_name, mono=True, change_sampling_rate=True)¶ Loads an audio signal and converts it to mono if needed
Parameters: - file_name : str
Path to the audio file
- mono : bool
if True, only returns left channel
- change_sampling_rate : bool
if True, the audio signal is re-sampled to self.sr
Returns: - array
audio signal
-
set_as_extracted(path)¶ Saves a json file with self.__dict__.
Useful for checking if the features files were calculated with same parameters.
Parameters: - path : str
Path to the JSON file