Resources

NVIDIA Omniverse Audio2Face Experimentation

NVIDIA Omniverse Audio2Face Experimentation

May 9, 2024

Article

Article

Ken Tai

This article chronicles my first experience with NVIDIA Omniverse, from installing the launcher to exporting a sequence in MP4 format using the Audio2Face application. It was an exciting and enlightening journey.


In my previous role as a UX architect at Microsoft, my team, "Customer Innovation," focused on the Industrial Metaverse. My mission was to support both clients and the team in envisioning use cases within the utilities, automotive, and manufacturing sectors. In 2022, I started hearing about and participating in internal sessions on Omniverse. As a UX specialist with a visual design background and sci-fi gaming enthusiast, I was always drawn to 3D and industrial design, and I was particularly intrigued by the real-time collaboration capabilities in the virtual and digital world. Technically, the industrial metaverse/digital twin data is based on the real physical world, but I noticed an intersection between the industrial and gaming spaces, especially in simulation components.


Since then, in 2024, I've noticed a significant increase in the number of articles about NVIDIA Omniverse, as well as a plethora of free tutorials on LinkedIn and YouTube. This has provided me with greater insight into the platform's rapid development and the exciting innovations taking place.


This experience marks the start of my journey into exploring Omniverse. I'm eager to dive deeper into its workflows, seeking innovations that will enhance human capabilities.

The Goals


My goals were to:

  1. Set up an Omniverse account and install necessary apps.

  2. Understand how Omniverse - USD (Universal Scene Description) functions and its workflows.

  3. Create a video where an AI model (provided by the default setting of Audio2Face) speaks about MOKUJIRO's main message, as found on the MOKUJIRO website.

Platforms used and Experience


Google Cloud

I converted text to speech in .WAV format with a couple of configurations to specify voice gender and set the accent from American to Australian. This could have been done with AWS Polly or IBM Watson, but I chose Google Cloud because it had the easiest registration process.


Omniverse Launcher

This is the gateway to the Omniverse platform, providing download, installation, and access to all Omniverse apps, connectors, and related utilities.


Nucleus Navigator

It provides a simple directory interface for accessing servers, projects, and files, including deep search capability. It was helpful to understand the localhost and assets folder structure, including my own localhost. While I've just started using Omniverse, it seems very simple, but I can imagine it will be helpful when I start producing lots of files later on.


Audio2Face

The latest version was released in December 2023. The app is a combination of AI-based technologies that generate facial animation and lip sync driven solely by an audio source. Character retargeting enables users to connect and animate their own characters.

This app was my primary focus for this experimentation. I configured the audio player with the track root path and adjusted settings for emotions, eye movements, lower denture, and tongue animations. Finally, I exported the Cache in USD file format.


Machinima

It empowers animated storytelling. New extensions let you assemble clips on characters, props, and cameras. AI-based pose estimation and Audio2Face make character animation fluid. I watched a couple of YouTube videos on how to export the sequence in MP4 format with the audio, but I could only render the animation without the audio using the movie capture feature.


Clipchamp

As I couldn't export the animation with audio, I tried stitching the animation and audio together in Clipchamp. I also added captions.


YouTube

In order to integrate the video into this framer website, I needed to obtain a URL by converting the video. I created an account and uploaded it. I realized that I could add captions within the YouTube editor.