Advertisement

ByteDance unveils 2 new video-generation AI models to narrow gap with OpenAI’s Sora

The new Doubao-PixelDance and Doubao-Seaweed large language models will be available from next month

Reading Time:2 minutes
Why you can trust SCMP
ByteDance’s new video-generation models reflect how Chinese tech firms have been making aggressive moves into this nascent AI market segment. Photo: Shutterstock
Coco Fengin Beijing
TikTok owner ByteDance has launched two new large language models (LLMs) – the technology underpinning generative artificial intelligence (AI) applications like ChatGPT – designed for creating videos based on text and image prompts, as Chinese tech firms look to catch up with the advances made by OpenAI’s Sora.
Advertisement
The new Doubao-PixelDance and Doubao-Seaweed LLMs – part of the Doubao family of AI models, which share the same name as the Doubao chatbot that ByteDance introduced last year – will be available early this October, according to Tan Dai, president of ByteDance cloud unit Volcano Engine.

The Doubao-PixelDance model, which is able to handle complex and sequential motions, can produce 10-second videos, while the Doubao-Seaweed model can generate clips of up to 30 seconds, according to Volcano Engine’s website.

The addition of video-generation AI models to the Doubao LLM family “has benefited from the capabilities of understanding videos accumulated by Douyin and Jianying over the years”, Tan said at an event in Shenzhen on Tuesday, referring to the Chinese version of TikTok and ByteDance’s popular video-editing app known as CapCut outside the mainland.
Tan Dai, president of ByteDance cloud unit Volcano Engine, presents two new video-generation artificial intelligence models at an event in Shenzhen on Tuesday. Photo: Handout
Tan Dai, president of ByteDance cloud unit Volcano Engine, presents two new video-generation artificial intelligence models at an event in Shenzhen on Tuesday. Photo: Handout

Tan’s demonstration at the event showed that both new AI models were able to generate videos that simulate real-life scenes, like a first-person view of driving a car, as well as fictional clips such as a winged frog flying and a floating island.

Advertisement

He said the new models offer “stability”, in terms of subject and style, when a video cuts from one shot to another, which remains a big challenge for other video-generation LLMs.

Advertisement