ECTI TRANSACTIONS ON COMPUTER INFORMATION TECHNOLOGYVolume 16, No. 02, Month JUNE, Year 2022, Pages 135 - 141
Persons facial image synthesis from audio with generative adversarial networks
Huzaifa Maniyar, Suneeta V. Budihal, Saroja V. Siddamal
Abstract Download PDFThis paper proposes to build a framework with Generative Adversarial Network (GANs) to synthesize a person"es facial image from audio input. Image and speech are the two main sources of information exchange between two entities. In some data intensive applications, a large amount of audio has to be translated into an understandable image format, with automated system, without human interference. This paper provides an end-to-end model for intelligible image reconstruction from an audio signal. The model uses a GAN architecture, which generates image features using audio waveforms for image synthesis. The model was created to produce facial images from audio of individual identities of a synthesized image of the speakers, based on the training dataset. The images of labelled persons are generated using excitation signals and the method obtained results with an accuracy of 96.88% for ungrouped data and 93.91% for grouped data.
Generative Ad- versarial Network(GAN), Image Synthesis, Speech Processing, Deep Learning