Resources

NVIDIA Omniverse Audio2Face Experimentation

NVIDIA Omniverse Audio2Face Experimentation

9 May 2024

Article

Ken Tai

This article chronicles my first experience with NVIDIA Omniverse, from installing the launcher to exporting a sequence in MP4 format using the Audio2Face application. It was an exciting and enlightening journey.


In my previous role as a UX architect at Microsoft, my team, "Customer Innovation," focused on the Industrial Metaverse. My mission was to support both clients and the team in envisioning use cases within the utilities, automotive, and manufacturing sectors. In 2022, I started hearing about and participating in internal sessions on Omniverse. As a UX specialist with a visual design background and sci-fi gaming enthusiast, I was always drawn to 3D and industrial design, and I was particularly intrigued by the real-time collaboration capabilities in the virtual and digital world. Technically, the industrial metaverse/digital twin data is based on the real physical world, but I noticed an intersection between the industrial and gaming spaces, especially in simulation components.


Since then, in 2024, I've noticed a significant increase in the number of articles about NVIDIA Omniverse, as well as a plethora of free tutorials on LinkedIn and YouTube. This has provided me with greater insight into the platform's rapid development and the exciting innovations taking place.


This experience marks the start of my journey into exploring Omniverse. I'm eager to dive deeper into its workflows, seeking innovations that will enhance human capabilities.

The Goals


My goals were to:

  1. Set up an Omniverse account and install necessary apps.

  2. Understand how Omniverse - USD (Universal Scene Description) functions and its workflows.

  3. Create a video where an AI model (provided by the default setting of Audio2Face) speaks about MOKUJIRO's main message, as found on the MOKUJIRO website.

Platforms used and Experience


Google Cloud

I converted text to speech in .WAV format with a couple of configurations to specify voice gender and set the accent from American to Australian. This could have been done with AWS Polly or IBM Watson, but I chose Google Cloud because it had the easiest registration process.


Omniverse Launcher

This is the gateway to the Omniverse platform, providing download, installation, and access to all Omniverse apps, connectors, and related utilities.


Nucleus Navigator

It provides a simple directory interface for accessing servers, projects, and files, including deep search capability. It was helpful to understand the localhost and assets folder structure, including my own localhost. While I've just started using Omniverse, it seems very simple, but I can imagine it will be helpful when I start producing lots of files later on.


Audio2Face

The latest version was released in December 2023. The app is a combination of AI-based technologies that generate facial animation and lip sync driven solely by an audio source. Character retargeting enables users to connect and animate their own characters.

This app was my primary focus for this experimentation. I configured the audio player with the track root path and adjusted settings for emotions, eye movements, lower denture, and tongue animations. Finally, I exported the Cache in USD file format.


Machinima

It empowers animated storytelling. New extensions let you assemble clips on characters, props, and cameras. AI-based pose estimation and Audio2Face make character animation fluid. I watched a couple of YouTube videos on how to export the sequence in MP4 format with the audio, but I could only render the animation without the audio using the movie capture feature.


Clipchamp

As I couldn't export the animation with audio, I tried stitching the animation and audio together in Clipchamp. I also added captions.


YouTube

In order to integrate the video into this framer website, I needed to obtain a URL by converting the video. I created an account and uploaded it. I realized that I could add captions within the YouTube editor.

Takeways


Initial Impression with Omniverse Audio2Face

Setting up an Omniverse account and understanding how this powerful platform works was a great experience. I was particularly impressed by its generative AI capabilities, which have the potential to enhance accessibility and significantly speed up the process of creating realistic animations with human-like movements and expressions.


Ethical Considerations at the Forefront

However, as these AI-generated outcomes become increasingly lifelike, we must consider the ethical implications and ensure responsible development and use of such technologies. Addressing concerns around privacy, bias, and the potential misuse of highly realistic synthetic media will be crucial.


Initial Exploration: Just the Beginning

So far, I've only scratched the surface of Audio2Face's functionalities within Omniverse. While my goal for this experiment was to onboard the Omniverse ecosystem and create an initial output – which I consider successful – I haven't yet explored many of the advanced features this tool offers.



Next Steps


Use Case Exploration

Humanising and anthropomorphising digital assets can significantly enhance accessibility and improve comprehension of digital content. I am excited to collaborate with my clients to ideate and develop use cases that address their business challenges.


Blender Integration

Moving forward, I plan to use the 'Blender 4.2 alpha USD branch' to retarget Omniverse's AI features onto 3D objects I've created in Blender. This will allow me to incorporate these AI-generated animations and expressions into my personalized 3D models and scenes.

Additionally, I aim to experiment with incorporating personalized audio tracks, further enhancing the realism and customization of my animations.


Audio2Gesture Exploration

Another exciting feature I'll be exploring is 'Audio2Gesture' in Machinima. This tool promises to generate realistic full-body movements and gestures based on audio inputs, potentially opening up new avenues for creating engaging and expressive character animations.

By combining these powerful AI tools with my existing 3D modeling and animation skills, I'm eager to push the boundaries of what's possible and create truly immersive and lifelike digital experiences.