____   ____    _    ____  _____                          _      _
|  _ \ / ___|  / \  / ___|| ____|     _ __ ___   ___   __| | ___| |___
| | | | |     / _ \ \___ \|  _| _____| '_ ` _ \ / _ \ / _` |/ _ \ / __|
| |_| | |___ / ___ \ ___) | |__|_____| | | | | | (_) | (_| |  __/ \__ \
|____/ \____/_/   \_\____/|_____|    |_| |_| |_|\___/ \__,_|\___|_|___/

Welcome to DCASE-models’ documentation!

DCASE-models is an open-source Python library for rapid prototyping of environmental sound analysis systems, with an emphasis on deep-learning models.

Introduction

DCASE-models is an open-source Python library for rapid prototyping of environmental sound analysis systems, with an emphasis on deep–learning models. The project is on GitHub.

It’s main features / design goals are:

  • ease of use,
  • rapid prototyping of environmental sound analysis systems,
  • a simple and lightweight set of basic components that are generally part of a computational environmental audio analysis system,
  • a collection of functions for dataset handling, data preparation, feature extraction, and evaluation (most of which rely on existing tools),
  • a model interface to standardize the interaction of machine learning methods with the other system components,
  • an abstraction layer to make the library independent of the backend used to implement the machine learning model,
  • inclusion of reference implementations for several state-of-the-art algorithms.

DCASE-models is a work in progress, thus input is always welcome.

The available documentation is limited for now, but you can help to improve it.

Installation instructions

We recommend to install DCASE-models in a dedicated virtual environment. For instance, using anaconda:

conda create -n dcase python=3.6
conda activate dcase

DCASE-models uses SoX for functions related to the datasets. You can install it in your conda environment by:

conda install -c conda-forge sox

Before installing the library, you must install only one of the Tensorflow variants (CPU-only or GPU):

pip install "tensorflow<1.14" # for CPU-only version
pip install "tensorflow-gpu<1.14" # for GPU version

Then, you can install the library through the Python Package Index (PyPI) or from the source as explained below.

pypi

The simplest way to install DCASE-models is through the Python Package Index (PyPI). This will ensure that all required dependencies are fulfilled. This can be achieved by executing the following command:

pip install dcase_models

or:

sudo pip install dcase_models

to install system-wide, or:

pip install -u dcase_models

to install just for your own user.

source

If you’ve downloaded the archive manually from the releases page, you can install using the setuptools script:

tar xzf dcase_models-VERSION.tar.gz
cd dcase_models-VERSION/
python setup.py install

If you intend to develop DCASE-models or make changes to the source code, you can install with pip install -e to link to your actively developed source tree:

tar xzf dcase_models-VERSION.tar.gz
cd dcase_models-VERSION/
pip install -e .

Alternately, the latest development version can be installed via pip:

pip install git+https://github.com/pzinemanas/dcase_models

sox

Say something about installing other dependencies such as sox.

Tutorial and examples

This is a tutorial introduction to quickly get you up and running with DCASE-models

The package of the library includes a set of examples, organized into three different categories, which illustrate the usefulness of DCASE-models for carrying out research experiments or developing applications. These examples can also be used as templates to be adapted for implementing specific DCASE methods. The type of examples provided are:

  • scripts that perform each step in the typical development pipeline of a DCASE task
  • Jupyter Notebooks that replicate some of the experiments reported in the literature
  • a web interface for sound classification as an example of a high–level application

The following section gives a walk-through of the example scripts provided. Then, the next section describes the library organization and exemplifies the use of the most important classes and functionalities.

Example scripts

A set of Python scripts is provided in the examples folder of the package. They perform each step in the typical development pipeline of a DCASE task, i.e downloading a dataset, data augmentation, feature extraction, model training, fine-tuning, and model evaluation. Follow the instructions bellow to know how they are used.

Parameters setting

First, note that the default parameters are stored in the parameters.json file at the root folder of the package. You can use other parameters.json file by passing its path in the -p (or --path) argument of each script.

Usage information

In the following, we show examples on how to use these scripts for the typical development pipeline step by step. For further usage information please check each script instructions by typing:

python download_dataset.py --help

Dataset downloading

First, let’s start by downloading a dataset. For instance, to download the ESC-50 dataset just type:

python download_dataset.py -d ESC50

Note

Note that by default the dataset will be downloaded to the ../datasets/ESC50 folder, following the path set in the parameters.json file. You can change the path or the parameters.json file. The datasets available are listed in the Datasets section.

Data augmentation

If you want to use data augmentation techniques on this dataset, you can run the following script:

python data_augmentation.py -d ESC50

Note

Note that the name and the parameters of each transformation are defined in the parameters.json file. The augmentations implemented so far are pitch-shifting, time-stretching, and white noise addition. Please check the AugmentedDataset class for further information.

Feature extraction

Now, you can extract the features for each file in the dataset by typing:

python extract_features.py -d ESC50 -f MelSpectrogram

Note

Note that you have to specify the features name by the -f argument, in this case MelSpectrogram. All the features representations available are listed in the Features section.

Model training

To train a model is also very straightforward. For instance, to train the SB_CNN model on the ESC-50 dataset with the MelSpectrogram features extracted before just type:

python train_model.py -d ESC50 -f MelSpectrogram -m SB_CNN -fold fold1

Note

Note that in this case you have to pass the model name and a fold name as an argument, using -m and -fold, respectively. This fold is considered to be the fold for testing, meaning that it will not be used during training. All the implemented models available are listed in the Implemented models section.

Model evaluation

Once the model is trained, you can evaluate the model in the test set by typing:

python evaluate_model.py -d ESC50 -f MelSpectrogram -m SB_CNN -fold fold1

Note

Note that the fold specified as an argument is the one used for testing. This scripts prints the results that we get from sed_eval library.

Fine-tuning

Once you have a model trained in some dataset, you can fine-tune the model on another dataset. For instance, to use a pre-trained model on the ESC-50 dataset and fine-tune it on the MAVD dataset just type:

python fine_tuning.py -od ESC50 -ofold fold1 -f MelSpectrogram -m SB_CNN -d MAVD -fold test

Note

Note that the information of the original dataset is set by the -od and -ofold arguments. Besides, the -d and -fold arguments set the new dataset and the test fold, respectively.

Library organization

A description of the main classes and functionalities of the library is presented in this section, following the order of the typical pipeline: dataset preparation, data augmentation, feature extraction, data loading, data scaling, and model handling. Example code is provided for each step, but please check the documentation of the classes for further information.

The following is a class diagram of DCASE-models showing all the base classes and some of the implemented specializations.

_images/DCASE-class-diagram.png

Dataset

This is the base class designed to manage a dataset, its paths, and its internal structure. It includes methods to download the data, resample the audio files, and check that both processes succeed.

The library covers several publicly available datasets related to different tasks. Please check the list of currently available datasets in the Datasets section.

Each dataset is implemented in the library as a class that inherits from Dataset. This design provides a common and simple interface to work with any dataset. For instance, to use the UrbanSound8k dataset, it is enough to initialize its class with the path to the data folder, as follows.

dataset = UrbanSound8k(DATASET_PATH)

Then, the following methods are used to download the dataset and change its sampling rate (to 22050 Hz).

dataset.download()
dataset.change_sampling_rate(22050)

Note

Note that most of the datasets devised for research include a fold split and a corresponding evaluation setup (e.g. 5-fold cross-validation). This fold split is generally carefully selected to avoid biases and data contamination. In order to keep the results comparable to those reported in the literature, DCASE-models uses, whenever available, the predefined splits for each dataset. However, the user may define different splits or evaluation setups if needed.

AugmentedDataset

The previously defined dataset instance can be expanded using data augmentation techniques. The augmentations implemented so far are pitch-shifting, time-stretching, and white noise addition. The first two are carried out by means of pysox.

An augmented version of a given dataset can be obtained by initializing an instance of the AugmentedDataset class with the dataset as a parameter, as well as a dictionary containing the name and parameters of each transformation.

aug_dataset = AugmentedDataset(dataset,
                               augmentations)

After initialization, the following method will perform the actual augmentation and create new audio files for every dataset element according to the type and parameters of each augmentation.

aug_dataset.process()

Note

Note that the augmented dataset is indeed an instance of Dataset, so it can be used as any other dataset in the following steps of the pipeline.

FeatureExtractor

This is the base class to define different types of feature representations. It has methods to load an audio file, extract features, and save them. It can also check if the features were already extracted.

Feature representations are implemented as specializations of the base class FeatureExtractor, for instance, Spectrogram. Please check the list of currently available features in the Features section.

A FeatureExtractor is initialized with some parameters. For instance, to define a MelSpectrogram feature extractor the parameters are: length and hop in seconds of the feature representation analysis window (sequence_time and sequence_hop_time); window length and hop size (in samples) for the short-time Fourier Transform (STFT) calculation (audio_win and audio_hop); and the audio sampling rate (sr).

features = Spectrogram(sequence_time=1.0, sequence_hop_time=0.5,
                       audio_win=1024, audio_hop=512, sr=22050)

After initialization, the following method computes the features for each audio file in the dataset.

features.extract(dataset)

Once the features are extracted and saved to disk, they can be loaded using DataGenerator as explained in the following.

Note

Note that if the audio files are not sampled at the given frequency, they are converted before calculating the features.

DataGenerator

This class uses instances of Dataset and FeatureExtractor to prepare the data for model training, validation and testing. An instance of this class is created for each one of these processes.

data_gen_train = DataGenerator(dataset,
                               features,
                               train=True,
                               folds=[’train’])

data_gen_val = DataGenerator(dataset,
                             features,
                             train=False,
                             folds=[’val’])

At this point of the pipeline, the features and the annotations for training the model can be obtained as follows.

X_train, Y_train = data_gen_train.get_data()

Note

Note that instances of DataGenerator can be used to load data in batches. This feature is especially useful for traininga models on systems with memory limitations.

Scaler

Before feeding data to a model, it is common to normalize the data or scale it to a fixed minimum and maximum value. To do this, the library contains a Scaler class, based on scikit-learn preprocessing functions, that includes fit and transform methods.

scaler = Scaler("standard")
scaler.fit(X_train)
X_train = scaler.transform(X_train)

In addition, the scaler can be fitted in batches by means of passing the DataGenerator instance instead of the data itself.

scaler.fit(data_gen_train)

It is also possible to scale the data as it is being loaded from the disk, for instance, when training the model. To do so, the Scaler can be passed to the DataGenerator after its initialization.

data_gen_val.set_scaler(scaler)

ModelContainer

This class defines an interface to standardize the behavior of machine learning models. It stores the architecture and the parameters of the model. It provides methods to train and evaluate the model, and to save and load its architecture and weights. It also allows the inspection of the output of its intermediate stages (i.e. layers).

The library also provides a container class to define Keras models, namely KerasModelContainer, that inherits from ModelContainer, and implements its functionality using this specific machine learning backend. Even though the library currently supports only Keras, it is easy to specialize the ModelContainer class to integrate other machine learning tools, such as PyTorch.

Each model has its own class that inherits from a specific ModelContainer, such as KerasModelContainer. Please check the list of currently available features in the Implemented models section.

A model’s container has to be initialized with some parameters. These parameters vary across models, among which the most important are: input shape, number of classes, and evaluation metrics. Specific parameters may include the number of hidden layers or the number of convolutional layers, among others.

model_cont = SB_CNN(**model_params)

The ModelContainer class has a method to train the model. Training parameters can include, for example, number of epochs, learning rate and batch size.

model_cont.train((X_train, Y_train), **train_params)

To train the model in batches, the DataGenerator object can be passed to the train method instead of the pre-loaded data.

model_cont.train(data_gen_train, **train_params)

Performing model evaluation is also simple. For instance, the following code uses the test set for evaluating the model.

data_gen_test = DataGenerator(dataset,
                              features,
                              train=False,
                              folds=[’test’])
X_test, Y_test = data_gen_test.get_data()
results = model_cont.evaluate((X_test, Y_test))

The results’ format depends on which metrics are used. By default, the evaluation is performed using the metrics available from the sed_eval library. Therefore, the results are presented accordingly. Nevertheless, DCASE-models enables the use of others evaluating frameworks such as psds_eval, or the use of user-defined metrics in a straightforward way.

When building deep-learning models it is common practice to use fine-tuning and transfer learning techniques. In this way, one can reuse a network that was previously trained on another dataset or for another task, and adapt it to the problem at hand. This type of approach can also be carried out with the ModelContainer.

Extending the library

This section includes clear instructions on how to extend different components of DCASE-models.

Datasets

Each dataset is implemented in the library as a class that inherits from Dataset.

To include a new dataset in the library you should extend the Dataset class and implement:

  • __init__ , where you can define and store arguments related to the dataset.
  • build , where you define the fold list, label list, paths, etc.
  • generate_file_lists , where you define the dataset structure.
  • get_annotations , where you implement the function to get the annotations from a given audio file.
  • download , where you implement the steps to download and decompress the dataset.

Below we follow all the necessary steps to implement a new dataset. Let’s assume that the new dataset has two labels (dog and cat), three folds (train, validate and test), and the audio files are stored in DATASET_PATH/audio. Besides the new dataset has the following structure:

DATASET_PATH/
|
|- audio/
|  |- train
|  |  |- file1-0-X.wav
|  |  |- file2-1-X.wav
|  |  |- file3-0-X.wav
|  |
|  |- validate
|  |  |- file1-1-Y.wav
|  |  |- file2-0-Y.wav
|  |  |- file3-1-Y.wav
|  |
|  |- test
|  |  |- file1-1-Z.wav
|  |  |- file2-0-Z.wav
|  |  |- file3-0-Z.wav

Note that each fold has a folder inside the audio path. Also the file name includes the class label coded after the first dash character (0 for dog, 1 for cat).

The first step is to create a new class that inherits from Dataset, and implement its __init__() method. Since the only argument needed for this custom dataset is its path, we simply initialize the super().__init__() method. If your dataset needs other arguments from the user, add them here.

from dcase_models.data.dataset_base import Dataset


class CustomDataset(Dataset):
    def __init__(self, dataset_path):
        # Don't forget to add this line
        super().__init__(dataset_path)

Now implement the build() method. You should define here the audio_path, fold_list and label_list attributes. You can also define other attributes for your dataset.

def build(self):
    self.audio_path = os.path.join(self.dataset_path, 'audio')
    self.fold_list = ["train", "validate", "test"]
    self.label_list = ["dog", "cat"]
    self.evaluation_mode = 'train-validate-test'

The generate_file_lists() method defines the structure of the dataset. Basically this structure is defined in the self.file_lists dictionary. This dictionary stores the list of the paths to the audio files for each fold in the dataset. Note that you can use the list_wav_files() function to list all wav files in a given path.

def generate_file_lists(self):
    for fold in self.fold_list:
        audio_folder = os.path.join(self.audio_path, fold)
        self.file_lists[fold] = list_wav_files(audio_folder)

Now let’s define get_annotations(). This method receives three arguments: the path to the audio file, the features representation and the time resolution (used when the annotations are defined following a fix time-grid, e.g see URBAN_SED). Note that the first dimension (index sequence) of the annotations and the feature representation coincide. In this example the label of each audio file is coded in its name as explained before.

def get_annotations(self, file_name, features, time_resolution):
    y = np.zeros((len(features), len(self.label_list)))
    class_ix = int(os.path.basename(file_name).split('-')[1])
    y[:, class_ix] = 1
    return y

The download() method defines the steps for downloading the dataset. You can use the download() method from the parent Dataset to download and decompress all files from zenodo. Also you can use move_all_files_to_parent() function to move all files from a subdirectory to the parent.

def download(self, force_download=False):
    zenodo_url = "https://zenodo.org/record/1234567/files"
    zenodo_files = ["CustomDataset.tar.gz"]
    downloaded = super().download(
        zenodo_url, zenodo_files, force_download
    )
    if downloaded:
        # mv self.dataset_path/CustomDataset/* self.dataset_path/
        move_all_files_to_parent(self.dataset_path, "CustomDataset")
        # Don't forget this line
        self.set_as_downloaded()

Note

If you implement a class for a publicly available dataset that is not present in Dataset, consider filing a Github issue or, even better, sending us a pull request.

Features

Feature representations are implemented as specializations of the base class FeatureExtractor.

In order to implement a new feature you should write a class that inherits from FeatureExtractor.

The methods you should reimplement are:

  • __init__ , where you can define and store the features arguments.
  • calculate , where you define the feature calculation process.

For instance, if you want to implement Chroma features:

import numpy as np
import librosa
from dcase_models.data.features import FeatureExtractor


class Chroma(FeatureExtractor):
    def __init__(self, sequence_time=1.0, sequence_hop_time=0.5,
                audio_win=1024, audio_hop=680, sr=22050,
                n_fft=1024, n_chroma=12, pad_mode='reflect'):

        super().__init__(sequence_time=sequence_time,
                        sequence_hop_time=sequence_hop_time,
                        audio_win=audio_win, audio_hop=audio_hop,
                        sr=sr)

        self.n_fft = n_fft
        self.n_chroma = n_chroma
        self.pad_mode = pad_mode

    def calculate(self, file_name):
        # Load the audio signal
        audio = self.load_audio(file_name)

        # Pad audio signal
        if self.pad_mode is not None:
            audio = librosa.util.fix_length(
                audio,
                audio.shape[0] + librosa.core.frames_to_samples(
                    self.sequence_frames, self.audio_hop, n_fft=self.n_fft),
                axis=0, mode=self.pad_mode
            )

        # Get the spectrogram, shape (n_freqs, n_frames)
        stft = librosa.core.stft(audio, n_fft=self.n_fft,
                                hop_length=self.audio_hop,
                                win_length=self.audio_win, center=False)
        # Convert to power
        spectrogram = np.abs(stft)**2

        # Convert to chroma_stft, shape (n_chroma, n_frames)
        chroma = librosa.feature.chroma_stft(
            S=spectrogram, sr=self.sr, n_fft=self.n_fft, n_chroma=self.n_chroma)

        # Transpose time and freq dims, shape (n_frames, n_chroma)
        chroma = chroma.T

        # Windowing, creates sequences
        chroma = np.ascontiguousarray(chroma)
        chroma = librosa.util.frame(
            chroma, self.sequence_frames, self.sequence_hop, axis=0
        )

        return chroma

Models

The models are implemented as specializations of the base class KerasModelContainer.

To include a new model in the library you should extend the KerasModelContainer class and implement the following methods:

  • __init__ , where you can define and store the model arguments.
  • build , where you define the model architecture.

Note that you might also reimplement the train() method. This specially useful for complex models (multiple inputs and outputs, custom loss functions, etc.)

For instance, to implement a simple Convolutional Neural Network:

from keras.layers import Input, Lambda, Conv2D, MaxPooling2D
from keras.layers import Dropout, Dense, Flatten
from keras.layers import BatchNormalization
from keras.models import Model
import keras.backend as K
from dcase_models.model.container import KerasModelContainer


class CNN(KerasModelContainer):
    def __init__(self, model=None, model_path=None,
                metrics=['classification'], n_classes=10,
                n_frames=64, n_freqs=128):

        self.n_classes = n_classes
        self.n_frames = n_frames
        self.n_freqs = n_freqs

        # Don't forget this line
        super().__init__(model=model, model_path=model_path,
                        model_name='MLP', metrics=metrics)

    def build(self):
        # input
        x = Input(shape=(self.n_frames, self.n_freqs), dtype='float32', name='input')

        # expand dims
        y = Lambda(lambda x: K.expand_dims(x, -1), name='expand_dims')(x)

        # CONV 1
        y = Conv2D(24, (5, 5), padding='valid',
                   activation='relu', name='conv1')(y)
        y = MaxPooling2D(pool_size=(2, 2), strides=None,
                         padding='valid', name='maxpool1')(y)
        y = BatchNormalization(name='batchnorm1')(y)

        # CONV 2
        y = Conv2D(24, (5, 5), padding='valid',
                   activation='relu', name='conv2')(y)
        y = BatchNormalization(name='batchnorm2')(y)

        # Flatten and Dropout
        y = Flatten(name='flatten')(y)
        y = Dropout(0.5, name='dropout1')(y)

        # Dense layer
        y = Dense(self.n_classes, activation='softmax', name='out')(y)

        # Create model
        self.model = Model(inputs=x, outputs=y, name='model')

        # Don't forget this line
        super().build()

Development

As an open-source project by researchers for researchers, we highly welcome any contribution!

What to contribute

Give feedback

To send us general feedback, questions or ideas for improvement, please post on our mailing list.

Report bugs

Please report any bugs at the issue tracker on GitHub. If you are reporting a bug, please include:

  • your version of madmom,
  • steps to reproduce the bug, ideally reduced to as few commands as possible,
  • the results you obtain, and the results you expected instead.

If you are unsure whether the experienced behaviour is intended or a bug, please just ask on our mailing list first.

Fix bugs

Look for anything tagged with “bug” on the issue tracker on GitHub and fix it.

Features

Please do not hesitate to propose any ideas at the issue tracker on GitHub. Think about posting them on our mailing list first, so we can discuss it and/or guide you through the implementation.

Alternatively, you can look for anything tagged with “feature request” or “enhancement” on the issue tracker on GitHub.

Write documentation

Whenever you find something not explained well, misleading or just wrong, please update it! The Edit on GitHub link on the top right of every documentation page and the [source] link for every documented entity in the API reference will help you to quickly locate the origin of any text.

How to contribute

Edit on GitHub

As a very easy way of just fixing issues in the documentation, use the Edit on GitHub link on the top right of a documentation page or the [source] link of an entity in the API reference to open the corresponding source file in GitHub, then click the Edit this file link to edit the file in your browser and send us a Pull Request.

For any more substantial changes, please follow the steps below.

Fork the project

First, fork the project on GitHub.

Then, follow the general installation instructions and, more specifically, the installation from source. Please note that you should clone from your fork instead.

Documentation

The documentation is generated with Sphinx. To build it locally, run the following commands:

cd docs
make html

Afterwards, open docs/_build/html/index.html to view the documentation as it would appear on readthedocs. If you changed a lot and seem to get misleading error messages or warnings, run make clean html to force Sphinx to recreate all files from scratch.

When writing docstrings, follow existing documentation as much as possible to ensure consistency throughout the library. For additional information on the syntax and conventions used, please refer to the following documents:

Citation

If you use DCASE-models in your work, please consider citing it:

@inproceedings{DCASE-models,
   Title = {{DCASE-models: a Python library for Computational Environmental Sound Analysis using deep-learning models}},
   Author = {Zinemanas, Pablo and Hounie, Ignacio and Cancela, Pablo and Font, Frederic and Rocamora, Martín and Serra, Xavier},
   Booktitle = {Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop},
   Month = {11},
   Year = {2020},
   Pages = {??--??},
   Address = {Tokyo, Japan},
   Doi = {10.00/00000.0000}
}

Data

Datasets

Datasets are implemented as specializations of the base class Dataset.

Dataset(dataset_path) Abstract base class to load and manage DCASE datasets.
UrbanSound8k(dataset_path) UrbanSound8k dataset.
ESC50(dataset_path) ESC-50 dataset.
ESC10(dataset_path) ESC-10 dataset.
URBAN_SED(dataset_path) URBAN-SED dataset.
SONYC_UST(dataset_path) SONYC-UST dataset.
TAUUrbanAcousticScenes2019(dataset_path) TAU Urban Acoustic Scenes 2019 dataset.
TAUUrbanAcousticScenes2020Mobile(dataset_path) TAU Urban Acoustic Scenes 2019 dataset.
TUTSoundEvents2017(dataset_path) TUT Sound Events 2017 dataset.
FSDKaggle2018(dataset_path) FSDKaggle2018 dataset.
MAVD(dataset_path) MAVD-traffic dataset.

Features

Features are implemented as specializations of the base class FeatureExtractor.

FeatureExtractor([sequence_time, …]) Abstract base class for feature extraction.
Spectrogram([sequence_time, …]) Spectrogram feature extractor.
MelSpectrogram([sequence_time, …]) MelSpectrogram feature extractor.
Openl3([sequence_time, sequence_hop_time, …]) Openl3 feature extractor.
RawAudio([sequence_time, sequence_hop_time, …]) RawAudio feature extractor.
FramesAudio([sequence_time, …]) FramesAudio feature extractor.

Augmentation

AugmentedDataset(dataset, sr, augmentations_list) Class that manage data augmentation.
WhiteNoise(snr) Implements white noise augmentation.

DataGenerator

DataGenerator(dataset, inputs, folds[, …]) Includes methods to load features files from DCASE datasets.
KerasDataGenerator(data_generator)

Scaler

Scaler([normalizer]) Scaler object to normalize or scale the data.

Models

ModelContainer

A ModelContainer defines an interface to standardize the behavior of machine learning models. It stores the architecture and the parameters of the model. It provides methods to train and evaluate the model, and to save and load its architecture and weights.

ModelContainer([model, model_path, …]) Abstract base class to store and manage models.
KerasModelContainer([model, model_path, …]) ModelContainer for keras models.

Implemented models

Each implemented model has its own class that inherits from a specific ModelContainer, such as KerasModelContainer.

MLP([model, model_path, metrics, n_classes, …]) KerasModelContainer for a generic MLP model.
SB_CNN([model, model_path, metrics, …]) KerasModelContainer for SB_CNN model.
SB_CNN_SED([model, model_path, metrics, …]) KerasModelContainer for SB_CNN_SED model.
A_CRNN([model, model_path, metrics, …]) KerasModelContainer for A_CRNN model.
VGGish([model, model_path, metrics, …]) KerasModelContainer for VGGish model
SMel([model, model_path, metrics, …]) KerasModelContainer for SMel model.
MST([model, model_path, metrics, mel_bands, …]) KerasModelContainer for MST model.

Utilities

Metric functions

predictions_temporal_integration(Y_predicted) Integrate temporal dimension.
evaluate_metrics(model, data, metrics, **kwargs) Calculate metrics over files with different length
sed(Y_val, Y_predicted[, sequence_time_sec, …]) Calculate metrics for Sound Event Detection
classification(Y_val, Y_predicted[, label_list]) Calculate metrics for Audio Classification
tagging(Y_val, Y_predicted[, label_list]) Calculate metrics for Audio Tagging
accuracy(Y_val, Y_predicted)
ER(Y_val, Y_predicted[, sequence_time_sec, …])
F1(Y_val, Y_predicted[, sequence_time_sec, …])

Data functions

get_fold_val(fold_test, fold_list) Get the validation fold given the test fold.
evaluation_setup(fold_test, folds, …[, …]) Return a evaluation setup given by the evaluation_mode.

Events functions

contiguous_regions(act)
evaluation_setup(fold_test, folds, …[, …]) Return a evaluation setup given by the evaluation_mode.
event_roll_to_event_list(event_roll, …) Convert a event roll matrix to a event list.
tag_probabilities_to_tag_list(…[, threshold]) Convert a tag probabilites matrix to a tag list.

Files functions

save_json(path, json_string) Save a json file in the location given by path.
load_json(path) Load a json file from path.
save_pickle(X, path) Save a pickle object in the location given by path.
load_pickle(path) Load a pickle object from path.
list_all_files(path) List all files in the path including subfolders.
list_wav_files(path) List all wav files in the path including subfolders.
load_training_log(weights_folder) Load the training log files of keras.
mkdir_if_not_exists(path[, parents]) Make dir if does not exists.
download_files_and_unzip(dataset_folder, …) Download files from zenodo and decompress them.
move_all_files_to(source, destination) Move all files from source to destination
move_all_files_to_parent(parent, child) Move all files in parent/child to the parent/
duplicate_folder_structure(origin_path, …) Duplicate the folder structure from the origin to the destination.
example_audio_file([index]) Get path to an example audio file

Callback functions

ClassificationCallback(data[, file_weights, …]) Keras callback to calculate acc after each epoch and save file with the weights if the evaluation improves
SEDCallback(data[, file_weights, best_F1, …]) Keras callback to calculate F1 and ER after each epoch and save file with the weights if the evaluation improves.
TaggingCallback(data[, file_weights, …]) Keras callback to calculate acc after each epoch and save file with the weights if the evaluation improves
F1ERCallback(X_val, Y_val[, file_weights, …]) Keras callback to calculate F1 and ER after each epoch and save file with the weights if the evaluation improves

GUI functions

encode_audio(data, sr) Encode an audio signal for web applications.

UI functions

progressbar(it[, prefix, size, file]) Iterable progress bar.

Miscellaneous functions

get_class_by_name(classes_dict, class_name, …) Get a class given its name.

Indices and tables