dcase_models.data.Openl3

class dcase_models.data.Openl3(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, content_type='env', input_repr='mel256', embedding_size=512)[source]

Bases: dcase_models.data.feature_extractor.FeatureExtractor

Openl3 feature extractor.

Based in openl3 library.

Parameters:
content_type : {‘music’ or ‘env’}, default=’env’

Type of content used to train the embedding model. Refer to openl3.core.get_audio_embedding.

input_repr : {‘linear’, ‘mel128’, or ‘mel256’}

Spectrogram representation used for model. Refer to openl3.core.get_audio_embedding.

embedding_size : {6144 or 512}, default=512

Embedding dimensionality. Refer to openl3.core.get_audio_embedding.

pad_mode : str or None, default=’reflect’

Mode of padding applied to the audio signal. This argument is passed to librosa.util.fix_length for padding the signal. If pad_mode is None, no padding is applied.

See also

FeatureExtractor
FeatureExtractor base class
Spectrogram
Spectrogram features

Examples

Extract features of a given file.

>>> from dcase_models.data.features import Openl3
>>> from dcase_models.util.files import example_audio_file
>>> features = Openl3()
>>> features_shape = features.get_shape()
>>> print(features_shape)
    (20, 512)
>>> file_name = example_audio_file()
>>> mel_spectrogram = features.calculate(file_name)
>>> print(mel_spectrogram.shape)
    (3, 512)

Extract features for each file in a given dataset.

>>> from dcase_models.data.datasets import ESC50
>>> dataset = ESC50('../datasets/ESC50')
>>> features.extract(dataset)
__init__(sequence_time=1.0, sequence_hop_time=0.5, audio_win=1024, audio_hop=680, sr=22050, content_type='env', input_repr='mel256', embedding_size=512)[source]

Initialize the FeatureExtractor

Methods

__init__([sequence_time, sequence_hop_time, …]) Initialize the FeatureExtractor
calculate(file_name) Loads an audio file and calculates features
check_if_extracted(dataset) Checks if the features of each file in dataset was calculated.
check_if_extracted_path(path) Checks if the features saved in path were calculated.
extract(dataset) Extracts features for each file in dataset.
get_features_path(dataset) Returns the path to the features folder.
get_shape([length_sec]) Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.
load_audio(file_name[, mono, …]) Loads an audio signal and converts it to mono if needed
set_as_extracted(path) Saves a json file with self.__dict__.
calculate(file_name)[source]

Loads an audio file and calculates features

Parameters:
file_name : str

Path to the audio file

Returns:
ndarray

feature representation of the audio signal

check_if_extracted(dataset)

Checks if the features of each file in dataset was calculated.

Calls check_if_extracted_path for each path in the dataset.

Parameters:
path : str

Path to the features folder

Returns:
bool

True if the features were already extracted.

check_if_extracted_path(path)

Checks if the features saved in path were calculated.

Compare if the features were calculated with the same parameters of self.__dict__.

Parameters:
path : str

Path to the features folder

Returns:
bool

True if the features were already extracted.

extract(dataset)

Extracts features for each file in dataset.

Call calculate() for each file in dataset and save the result into the features path.

Parameters:
dataset : Dataset

Instance of the dataset.

get_features_path(dataset)

Returns the path to the features folder.

Parameters:
dataset : Dataset

Instance of the dataset.

Returns:
features_path : str

Path to the features folder.

get_shape(length_sec=10.0)

Calls calculate() with a dummy signal of length length_sec and returns the shape of the feature representation.

Parameters:
length_sec : float

Duration in seconds of the test signal

Returns:
tuple

Shape of the feature representation

load_audio(file_name, mono=True, change_sampling_rate=True)

Loads an audio signal and converts it to mono if needed

Parameters:
file_name : str

Path to the audio file

mono : bool

if True, only returns left channel

change_sampling_rate : bool

if True, the audio signal is re-sampled to self.sr

Returns:
array

audio signal

set_as_extracted(path)

Saves a json file with self.__dict__.

Useful for checking if the features files were calculated with same parameters.

Parameters:
path : str

Path to the JSON file