A Classifying Variational Autoencoder with Application to Polyphonic Music Generation

This is the implementation of the Classifying VAE and Classifying VAE+LSTM models, as described in A Classifying Variational Autoencoder with Application to Polyphonic Music Generation by Jay A. Hennig, Akash Umakantha, and Ryan C. Williamson.

These models extend the standard VAE and VAE+LSTM to the case where there is a latent discrete category. In the case of music generation, for example, we may wish to infer the key of a song, so that we can generate notes that are consistent with that key. These discrete latents are modeled as a Logistic Normal distribution, so that random samples from this distribution can use the reparameterization trick during training.

Code for these models (in Keras) can be found here.

Training data for the JSB Chorales and Piano-midi corpuses can be found in data/input. Songs have been transposed into C major or C minor (*_Cs.pickle), for comparison to previous work, or kept in their original keys (*_all.pickle).

Generated music samples

Samples from the models trained on the JSB Chorales and Piano-midi corpuses, with songs in their original keys, can be found below, or in data/samples.

JSB Chorales (all keys):

VAE
Classifying VAE (inferred key)
VAE+LSTM
Classifying VAE+LSTM (inferred key)

Piano-midi (all keys):

VAE
Classifying VAE (inferred key)
Classifying VAE (given key)

Training new models

Example of training a Classifying VAE with 4 latent dimensions on JSB Chorales in two keys, and then generating a sample from this model:

$ python cl_vae/train.py run1 --use_x_prev --latent_dim 4 --train_file '../data/input/JSB Chorales_Cs.pickle'
$ python cl_vae/sample.py outfile --model_file ../data/models/run1.h5 --train_file '../data/input/JSB Chorales_Cs.pickle'