Generation of synthetic speech from a neural network output
LE3 .A278 2017
Silver, Daniel L.
Bachelor of Computer Science
The primary goal of this research has been to create a computer system that replicates the human ability to learn how to speak. Specifically, it is to create a computer representation of a speech system that can learn how to reproduce the sounds that it hears. The system possesses components analogous to the human speech system, using the Fast Fourier Transform (FFT) to emulate aspects of the human auditory system, and GNU Speech’s articulatory speech synthesizer to emulate the human vocal tract. A genetic algorithm is implemented which produces a speech vector to pass to GNU Speech. An FFT of the audio coming from the speech synthesizer is compared against the FFT of the training audio as a fitness function to developcandidates of the genetic algorithm. The genetic algorithm is executed many times for many different speech sounds, and the resulting FFT-speech vector pairs are compiled into training examples used to train an artificial neural network. The genetic algorithm is able to produce audio examples that are 71% recognizable to human listeners, and the artificial neural network is able to model these examples with 75% accuracy. Experiments reveal that there is a limitation in the ability of GNU Speech to accurately reproduce human speech, so a certain portion of the erroris due to this inability. Experiments also reveal that there are limitations to the FFTs capacity to represent the sound for this application, which also explains where some of the error is introduced.
The author retains copyright in this thesis. Any substantial copying or any other actions that exceed fair dealing or other exceptions in the Copyright Act require the permission of the author.