CJ Carr and Zack Zukowski recently launched a YouTube channel that streams a never-ending barrage of death metal generated by AI. Their Dadabots project uses a recurrent neural network to identify patterns in the music, predict the most common elements and reproduce them.
Source : https://www.engadget.com/2019/04/21/ai-generated-death-metal-stream/
Geeky techical detail from their paper :
We pre-process each audio dataset into 3,200 eight second chunks of raw audio data (FLAC). The chunks are randomly shuffled and split into training, testing, and validation sets. The split is 88% training, 6% testing, 6% validation.
We use a 2-tier SampleRNN with 256 embedding size, 1024 dimensions, 5 to 9 layers, LSTM or GRU, 256 linear quantization levels, 16kHz sample rate, skip connections, and a 128 batch size, using weight normalization. The LSTM gated units have a forget gate bias initialized with a large positive 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
value of 3. The initial state h0 is either learned or randomized. We train each model for about three days on a NVIDIA K80 GPU. Intermittently at checkpoints during training, audio clips are generated one sample at a time and converted to a WAV file. Originally SampleRNN used an argmax inference method. We modified it to sample from the softmax distribution.
At each checkpoint we generate 10x 30 second clips. Early checkpoints produce generalized textures.
Later checkpoints produce riffs with sectional transitions. If after a few epochs it only produces white noise, restart the training.
Sometimes a checkpoint generates clips which always get trapped in the same riff. Listen for traps before choosing a checkpoint for longer generations.
The number of simultaneously generated clips (n_seq) doesn’t effect the processing time, because they are enerated in parallel. The number is limited by GPU memory
https://dadabots.com/nips2017/generating-black-metal-and-math-rock.pdf