dcase_models.data.FeatureExtractor¶
-
class
dcase_models.data.FeatureExtractor(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, **kwargs)[source]¶ Bases:
objectAbstract base class for feature extraction.
Includes methods to load audio files, calculate features and prepare sequences.
Inherit this class to define custom features (e.g. features.MelSpectrogram, features.Openl3).
Parameters: - sequence_time : float, default=1.0
Length (in seconds) of the feature representation analysis windows (model’s input).
- sequence_hop_time : float, default=0.5
Hop time (in seconds) of the feature representation analysis windows.
- audio_win : int, default=1024
Window length (in samples) for the short-time audio processing (e.g short-time Fourier Transform (STFT))
- audio_hop : int, default=680
Hop length (in samples) for the short-time audio processing (e.g short-time Fourier Transform (STFT))
- sr : int, default=22050
Sampling rate of the audio signals. If the original audio is not sampled at this rate, it is re-sampled before feature extraction.
Examples
To create a new feature representation, it is necessary to define a class that inherits from FeatureExtractor. It is required to define the calculate() method.:
from dcase_models.data.feature_extractor import FeatureExtractor class Chroma(FeatureExtractor): def __init__(self, sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=512, sr=44100, # Add here your custom parameters n_fft=1024, n_chroma=12): # Don't forget this line super().__init__(sequence_time=sequence_time, sequence_hop_time=sequence_hop_time, audio_win=audio_win, audio_hop=audio_hop, sr=sr) self.sequence_samples = int(librosa.core.frames_to_samples( self.sequence_frames, self.audio_hop, n_fft=self.n_fft )) def calculate(self, file_name): # Here define your function to calculate the chroma features # Load the audio signal audio = self.load_audio(file_name) # Pad audio signal audio = librosa.util.fix_length( audio, audio.shape[0] + self.sequence_samples, axis=0, mode='constant' ) # Get the chroma features chroma = librosa.feature.chroma_stft(y=audio, sr=self.sr, n_fft=self.n_fft, hop_length=audio_hop, win_length=audio_win ) # Convert to sequences chroma = np.ascontiguousarray(chroma) chroma = librosa.util.frame(chroma, self.sequence_frames, self.sequence_hop, axis=0 ) return chroma
Attributes: - sequence_frames : int
Number of frames equivalent to the sequence_time.
- sequence_hop : int
Number of frames equivalent to the sequence_hop_time.
-
__init__(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, **kwargs)[source]¶ Initialize the FeatureExtractor
Methods
__init__([sequence_time, sequence_hop_time, …])Initialize the FeatureExtractor calculate(file_name)Loads an audio file and calculates features check_if_extracted(dataset)Checks if the features of each file in dataset was calculated. check_if_extracted_path(path)Checks if the features saved in path were calculated. convert_to_sequences(audio_representation)extract(dataset)Extracts features for each file in dataset. get_features_path(dataset)Returns the path to the features folder. get_shape([length_sec])Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation. load_audio(file_name[, mono, …])Loads an audio signal and converts it to mono if needed pad_audio(audio)set_as_extracted(path)Saves a json file with self.__dict__. -
calculate(file_name)[source]¶ Loads an audio file and calculates features
Parameters: - file_name : str
Path to the audio file
Returns: - ndarray
feature representation of the audio signal
-
check_if_extracted(dataset)[source]¶ Checks if the features of each file in dataset was calculated.
Calls check_if_extracted_path for each path in the dataset.
Parameters: - path : str
Path to the features folder
Returns: - bool
True if the features were already extracted.
-
check_if_extracted_path(path)[source]¶ Checks if the features saved in path were calculated.
Compare if the features were calculated with the same parameters of self.__dict__.
Parameters: - path : str
Path to the features folder
Returns: - bool
True if the features were already extracted.
-
extract(dataset)[source]¶ Extracts features for each file in dataset.
Call calculate() for each file in dataset and save the result into the features path.
Parameters: - dataset : Dataset
Instance of the dataset.
-
get_features_path(dataset)[source]¶ Returns the path to the features folder.
Parameters: - dataset : Dataset
Instance of the dataset.
Returns: - features_path : str
Path to the features folder.
-
get_shape(length_sec=10.0)[source]¶ Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.
Parameters: - length_sec : float
Duration in seconds of the test signal
Returns: - tuple
Shape of the feature representation
-
load_audio(file_name, mono=True, change_sampling_rate=True)[source]¶ Loads an audio signal and converts it to mono if needed
Parameters: - file_name : str
Path to the audio file
- mono : bool
if True, only returns left channel
- change_sampling_rate : bool
if True, the audio signal is re-sampled to self.sr
Returns: - array
audio signal