Alibaba's AI video generator just dunked on Sora by making the Sora lady sing
Alibaba wants you to compare its new AI video generator to OpenAI's Sora. Otherwise, why use it to make Sora's most famous creation belt out a Dua Lipa song?
On Tuesday, an organization called the "Institute for Intelligent Computing" within the Chinese e-commerce juggernaut Alibaba released a paper about an intriguing new AI video generator it has developed that's shockingly good at turning still images of faces into passable actors and charismatic singers. The system is called EMO, a fun backronym supposedly drawn from the words "Emotive Portrait Alive" (though, in that case, why is it not called "EPO"?).
EMO is a peek into a future where a system like Sora makes video worlds, and rather than being populated by attractive mute people just kinda looking at each other, the "actors" in these AI creations say stuff — or even sing.
Alibaba put demo videos on GitHub to show off its new video-generating framework. These include a video of the Sora lady — famous for walking around AI-generated Tokyo just after a rainstorm — singing "Don't Start Now" by Dua Lipa and getting pretty funky with it.
The demos also reveal how EMO can, to cite one example, make Audrey Hepburn speak the audio from a viral clip of Riverdale's Lili Reinhart talking about how much she loves crying. In that clip, Hepburn's head maintains a rather soldier-like upright position, but her whole face — not just her mouth — really does seem to emote the words in the audio.
SEE ALSO:What was Sora trained on? Creatives demand answers.In contrast to this uncanny version of Hepburn, Reinhart in the original clip moves her head a whole lot, and she also emotes quite differently, so EMO doesn't seem to be a riff on the sort of AI face-swapping that went viral back in the mid-2010s and led to the rise of deepfakes in 2017.
Over the past few years, applications designed to generate facial animation from audio have cropped up, but they haven't been all that inspiring. For instance, the NVIDIA Omniverse software package touts an app with an audio-to-facial-animation framework called "Audio2Face" — which relies on 3D animation for its outputs rather than simply generating photorealistic video like EMO.
Despite Audio2Face only being two years old, the EMO demo makes it look like an antique. In a video that purports to show off its ability to mimic emotions while talking, the 3D face it depicts looks more like a puppet in a facial expression mask, while EMO's characters seem to express the shades of complex emotion that come across in each audio clip.
It's worth noting at this point that, like with Sora, we're assessing this AI framework based on a demo provided by its creators, and we don't actually have our hands on a usable version that we can test. So it's tough to imagine that right out of the gate this piece of software can churn out such convincingly human facial performances based on audio without significant trial and error, or task-specific fine-tuning.
Related Stories
- China's live streaming factories are bleak. Now TikTok wants to open one in the U.S.
- The White House is cracking down on brokers selling your data to China and Russia
- Tesla faces new potential challenge in China: Xiaomi's first EV cars
The characters in the demos mostly aren't expressing speech that calls for extreme emotions — faces screwed up in rage, or melting down in tears, for instance — so it remains to be seen how EMO would handle heavy emotion with audio alone as its guide. What's more, despite being made in China, it's depicted as a total polyglot, capable of picking up on the phonics of English and Korean, and making the faces form the appropriate phonemes with decent — though far from perfect — fidelity. So in other words, it would be nice to see what would happen if you put audio of a very angry person speaking a lesser-known language into EMO to see how well it performed.
Also fascinating are the little embellishments between phrases — pursed lips or a downward glance — that insert emotion into the pauses rather than just the times when the lips are moving. These are examples of how a real human face emotes, and it's tantalizing to see EMO get them so right, even in such a limited demo.
According to the paper, EMO's model relies on a large dataset of audio and video (once again: from where?) to give it the reference points necessary to emote so realistically. And its diffusion-based approach apparently doesn't involve an intermediate step in which 3D models do part of the work. A reference-attention mechanismand a separate audio-attention mechanismare paired by EMO's model to provide animated characters whose facial animations match what comes across in the audio while remaining true to the facial characteristics of the provided base image.
It's an impressive collection of demos, and after watching them it's impossible not to imagine what's coming next. But if you make your money as an actor, try not to imagine too hard, because things get pretty disturbing pretty quick.
Featured Video For You
Sora Explainer
(责任编辑:产品中心)
-
10 Places to Get to Know Paul Bunyan
Paul Bunyan, a larger-than-life lumberjack with super-human strength, was a character created by Can ...[详细] -
芦山县“千龙千狮闹新春·欢欢喜喜过大年”龙狮街头巡游活动精彩上演
本网讯2月10日大年初一上午,“千龙千狮闹新春·欢欢喜喜过大年”龙狮街头巡游活动在芦山县城精彩上演,开启了芦山的新年序幕。热闹的活动吸引了众多市民驻足观看,喜迎新春佳节。当天一大早,由文艺爱好者组成的 ...[详细] -
本报讯为有效防范应对低温雨雪冰冻灾害,做好蔬菜、水果等农作物抢收和保护工作,加强畜禽养殖防寒防冻保暖,最大程度减轻低温寒潮天气不利影响,近日,市农业农村局印发《低温雨雪冰冻灾害防范应对预案》2024修 ...[详细]
-
本报讯为践行金融为民理念,扎实做好人民币现金服务,普及人民币相关知识,保障辖区良好流通秩序,2月6日,在人民银行雅安市分行的统一组织指导下,雅安农商银行在市区新民街开展了“现金服务情暖巴蜀”暨数字人民 ...[详细]
-
全国土壤普查办抽验组到广东开展土壤普查质量抽验_南方+_南方plus8月17-19日,全国土壤普查办质量抽验组到广东开展第三次全国土壤普查质量抽验工作,抽验组由生态环境部华南环境科学研究所副所长刘晓文 ...[详细]
-
本报讯2月6日,一场志愿服务活动在雨城区青江街道汉碑路社区爱国苑小区、姚桥老市场开展,这群志愿者的平均年龄70岁,他们发挥余热,清扫垃圾,让该小区和市场环境焕然一新。当日上午11点半,迎着寒风细雨,雅 ...[详细]
-
欢度新春佳节,尽享年货盛宴2月1日-2月7日,集美食、年货、娱乐为一体,具有雅安各县区特色的2024年货购物节促销活动在市民的欢声笑语和满载而归中圆满落下帷幕。雨城区高颐阙文博公园内的年货节现场共设有 ...[详细]
-
累计330人,广西金秀免费送务工人员返粤返岗|粤桂协作_南方+_南方plus2月19日,广西金秀瑶族自治县新年第二批免费运送务工人员活动启动,已累计运送外出务工人员330人,其中包括64名已脱贫的群众 ...[详细]
-
Sinner vs. Michelsen 2024 livestream: Watch US Open for free
TL;DR:Live stream Sinner vs. Michelsen in the 2024 US Open for free on 9Now or TVNZ+. Access these f ...[详细] -
机器不停歇、工人不停工、原料不断供、内销外销不断流……刚进入2024年不久,雅化集团各生产基地生产经营呈现出一派只争朝夕的繁忙景象。2024年是中华人民共和国成立75周年,是实施“十四五”规划的一年, ...[详细]