Zen Video: Using AI to automate video editing

4 min read

Jan 27, 2020

By Hua Nan

Founded by a Carnegie Mellon roboticist, the Zen Video app reduces the time required to edit video clips to only a few minutes, meeting growing demand for short videos

Making a video clip used to require professional equipment and a skilled team for the shoot and post-production. It was a time-consuming process: to produce a one-minute video required anywhere from two to four hours of video editing. In recent years, the soaring demand for short videos has seen video producers seeking out more efficient video-making tools.

Hangzhou-based Zen Video has since filled the gap by shortening the video-making process to just a few minutes. Founded in December 2016, the startup developed a cloud-based AI platform that lets users enter text – whether several words or a long article – to automatically generate a video clip.

Using natural language processing (NLP) technology, the platform analyzes the text and extracts key words before browsing through its library to select videos and pictures that best fit each sentence to generate a raw video clip. Other relevant materials are recommended and displayed alongside the automatically-generated raw video clip. Users can further edit the clip by selecting points along the video's timeline and inserting additional material.

“You can make videos as long as you know how to surf the Web – no professional, powerful video-editing hardware needed,” said Zen Video founder Kang Hongwen.

Embracing the short video era

Short videos exploded on the social media scene in China between 2015 and 2016, with Douyin (the Chinese version of TikTok), Kuaishou and other short video platforms quickly establishing their stronghold in the mobile world. Traditional media outlets have also been scrambling to get up to speed with this trend by using short videos to attract eyeballs. According to iResearch Consulting Group, China’s short video market was valued at RMB 5.73bn in 2017 with a year-on-year growth of 183.9% and is predicted to surpass RMB 30bn by 2020 .

Kang enrolled in Carnegie Mellon University in 2006 to study computer science under the guidance of computer scientist Takeo Kanade, one of the world’s earliest researchers of computer vision. Kang obtained his PhD in Robotics in 2012. From 2015, he led a team with expertise in NLP and other AI technologies in streamlining the process of making TV programs at Hunan Television and Zhejiang Television, two TV channels popular with young audiences for their entertainment programs.

You can make videos as long as you know how to surf the Web – no professional, powerful video-editing hardware needed

Back then, he noticed that a one-hour-long variety show often entailed over 1,000 hours of video editing before it was ready for broadcasting. Much of the time spent was on tedious repetitions of necessary operations that could be carried out by machines. “When video content is successful, 90% of the credit goes to creative ideas and only 5% to manual labor. But the 5% manual labor usually consumes most of the time,” Kang said.

Seeing that the traditional video-editing model was struggling to meet the rising demand for user-created content by freelance videographers, Kang's team turned the idea of building an AI-powered video-making platform into reality in late-2016. “What we wanted to create was a central hub of AI + video + cloud platform,” he said.

To expand quickly, Zen Video targets clients that make videos at great frequency and volume. Short video apps, such as Pear Video and Miaopai, ByteDance’s news platform Jinri Toutiao and streaming service Tencent Video are among the institutional clients using Zen Video to enhance their video-making efficiency. Pear Video’s editing team select and edit photos and videos that are uploaded by its thousands of content providers nationwide onto the Zen Video platform.

Tech innovation for social media

Zen Video is not the only player in the arena. It is one of Pear Video's two partners when it comes to AI-powered video creations. The other partner is New York-based Wochit, which was founded in 2012.

As the demand for short videos rise in China, more domestic competitors have joined the game. Weibo, one of China’s top social media platforms, launched its version, Weibo Yunjian, in June 2017. A month later, Alibaba Cloud launched its video editing platform, ApsaraVideo.

Zen Video has continued its R&D efforts to maintain its technological edge. In July 2017, it released an API for video analysis, making it available to third-party developers, media and content providers.

At the API's launch event, Kang said that the API can be used to deconstruct a video clip with a frame-by-frame precision level, enabling better dissection and understanding of the video content.

He also has set his sights on the API's application in converting content from “self-media” or “we-media” – independently-operated social media accounts on platforms, such as WeChat and Weibo – into short videos. He believes self-media needs to change the way they reach out to followers in an era of short videos.

“On average, less than 2% of the subscribed followers of a user's WeChat official account will actually read the articles posted on the account. Consumers’ reading interest has shifted to watching videos. Operators of WeChat official accounts need to transform,” Kang said.