dcase_models.data.FeatureExtractor¶

class dcase_models.data.FeatureExtractor(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, **kwargs)[source]¶

Bases: object

Abstract base class for feature extraction.

Includes methods to load audio files, calculate features and prepare sequences.

Inherit this class to define custom features (e.g. features.MelSpectrogram, features.Openl3).

Parameters:

sequence_time : float, default=1.0: Length (in seconds) of the feature representation analysis windows (model’s input).
sequence_hop_time : float, default=0.5: Hop time (in seconds) of the feature representation analysis windows.
audio_win : int, default=1024: Window length (in samples) for the short-time audio processing (e.g short-time Fourier Transform (STFT))
audio_hop : int, default=680: Hop length (in samples) for the short-time audio processing (e.g short-time Fourier Transform (STFT))
sr : int, default=22050: Sampling rate of the audio signals. If the original audio is not sampled at this rate, it is re-sampled before feature extraction.

Examples

To create a new feature representation, it is necessary to define a class that inherits from FeatureExtractor. It is required to define the calculate() method.:

from dcase_models.data.feature_extractor import FeatureExtractor
class Chroma(FeatureExtractor):
    def __init__(self, sequence_time=1.0, sequence_hop_time=0.5,
                     audio_win=1024, audio_hop=512, sr=44100,
                     # Add here your custom parameters
                     n_fft=1024, n_chroma=12):
        # Don't forget this line
        super().__init__(sequence_time=sequence_time,
                         sequence_hop_time=sequence_hop_time,
                         audio_win=audio_win,
                         audio_hop=audio_hop, sr=sr)

        self.sequence_samples = int(librosa.core.frames_to_samples(
            self.sequence_frames,
            self.audio_hop,
            n_fft=self.n_fft
        ))
    def calculate(self, file_name):
        # Here define your function to calculate the chroma features
        # Load the audio signal
        audio = self.load_audio(file_name)
        # Pad audio signal
        audio = librosa.util.fix_length(
            audio,
            audio.shape[0] + self.sequence_samples,
            axis=0, mode='constant'
        )
        # Get the chroma features
        chroma = librosa.feature.chroma_stft(y=audio,
                                             sr=self.sr,
                                             n_fft=self.n_fft,
                                             hop_length=audio_hop,
                                             win_length=audio_win
                                             )
        # Convert to sequences
        chroma = np.ascontiguousarray(chroma)
        chroma = librosa.util.frame(chroma,
                                    self.sequence_frames,
                                    self.sequence_hop,
                                    axis=0
                                    )
        return chroma

Attributes:	sequence_frames : int Number of frames equivalent to the sequence_time. sequence_hop : int Number of frames equivalent to the sequence_hop_time.

__init__(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, **kwargs)[source]¶: Initialize the FeatureExtractor

Methods

`__init__`([sequence_time, sequence_hop_time, …])	Initialize the FeatureExtractor
`calculate`(file_name)	Loads an audio file and calculates features
`check_if_extracted`(dataset)	Checks if the features of each file in dataset was calculated.
`check_if_extracted_path`(path)	Checks if the features saved in path were calculated.
`convert_to_sequences`(audio_representation)
`extract`(dataset)	Extracts features for each file in dataset.
`get_features_path`(dataset)	Returns the path to the features folder.
`get_shape`([length_sec])	Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.
`load_audio`(file_name[, mono, …])	Loads an audio signal and converts it to mono if needed
`pad_audio`(audio)
`set_as_extracted`(path)	Saves a json file with self.__dict__.

calculate(file_name)[source]¶

Loads an audio file and calculates features

Parameters:	file_name : str Path to the audio file
Returns:	ndarray feature representation of the audio signal

check_if_extracted(dataset)[source]¶

Checks if the features of each file in dataset was calculated.

Calls check_if_extracted_path for each path in the dataset.

Parameters:	path : str Path to the features folder
Returns:	bool True if the features were already extracted.

check_if_extracted_path(path)[source]¶

Checks if the features saved in path were calculated.

Compare if the features were calculated with the same parameters of self.__dict__.

Parameters:	path : str Path to the features folder
Returns:	bool True if the features were already extracted.

convert_to_sequences(audio_representation)[source]¶

extract(dataset)[source]¶

Extracts features for each file in dataset.

Call calculate() for each file in dataset and save the result into the features path.

Parameters:	dataset : Dataset Instance of the dataset.

get_features_path(dataset)[source]¶

Returns the path to the features folder.

Parameters:	dataset : Dataset Instance of the dataset.
Returns:	features_path : str Path to the features folder.

get_shape(length_sec=10.0)[source]¶

Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.

Parameters:	length_sec : float Duration in seconds of the test signal
Returns:	tuple Shape of the feature representation

load_audio(file_name, mono=True, change_sampling_rate=True)[source]¶

Loads an audio signal and converts it to mono if needed

Parameters:	file_name : str Path to the audio file mono : bool if True, only returns left channel change_sampling_rate : bool if True, the audio signal is re-sampled to self.sr
Returns:	array audio signal

pad_audio(audio)[source]¶

set_as_extracted(path)[source]¶

Saves a json file with self.__dict__.

Useful for checking if the features files were calculated with same parameters.

Parameters:	path : str Path to the JSON file