MuseTalk is a real-time high quality audio-driven lip-syncing model trained in the latent space of ft-mse-vae, which. modifies an unseen face according to the input audio, with a size of face region of 256 x 256. supports audio in various languages, such as Chinese, English, and Japanese. supports real-time inference with 30fps+ on an NVIDIA ... More @Wikipedia
Hover over any link to get a description of the article. Please note that search keywords are sometimes hidden within the full article and don't appear in the description or title.