ThaiScience  


ECTI TRANSACTIONS ON COMPUTER INFORMATION TECHNOLOGY


Volume 16, No. 02, Month JUNE, Year 2022, Pages 135 - 141


Persons facial image synthesis from audio with generative adversarial networks

Huzaifa Maniyar, Suneeta V. Budihal, Saroja V. Siddamal


Abstract Download PDF

This paper proposes to build a framework with Generative Adversarial Network (GANs) to synthesize a person"es facial image from audio input. Image and speech are the two main sources of information exchange between two entities. In some data intensive applications, a large amount of audio has to be translated into an understandable image format, with automated system, without human interference. This paper provides an end-to-end model for intelligible image reconstruction from an audio signal. The model uses a GAN architecture, which generates image features using audio waveforms for image synthesis. The model was created to produce facial images from audio of individual identities of a synthesized image of the speakers, based on the training dataset. The images of labelled persons are generated using excitation signals and the method obtained results with an accuracy of 96.88% for ungrouped data and 93.91% for grouped data.


Keywords

Generative Ad- versarial Network(GAN), Image Synthesis, Speech Processing, Deep Learning



ECTI TRANSACTIONS ON COMPUTER INFORMATION TECHNOLOGY


Published by : ECTI Association
Contributions welcome at : http://www.ecti-thailand.org/paper/journal/ECTI-CIT