The data is a fraction of the Mozilla common Voice, it was provided in the Competition I enrolled. I can try to share it with you, just let me know. There are 3 classes: UK English, US English and Australian (I can not remember). The final accuracy was about 80%-82%.
In the github repo in the folder speech_cnn there are 4 .py files and it is very easy to remove the code relative to Azure, you can find the model definition, data augmentition, the training,... it is a simple CNN model in Keras.
If you need any help, I will do my best to help you