SenseTime unveils updated large language model

Writer: Yang Yunfei | Editor: Zhang Zeling | From: | Updated: 2024-07-08

SenseTime Group Inc., once a leader in artificial intelligence (AI)-powered facial recognition, on Friday unveiled the latest version of its large language model at the World Artificial Intelligence Conference (WAIC) in Shanghai, China’s biggest annual AI conference.

U.S.-sanctioned SenseTime released a series of updated versions of its SenseNova large model, including SenseNova 5.5, its latest foundational model.

The performance of the latest iteration of SenseTime’s large model has improved 30% compared with SenseNova 5.0, the previous version released in April this year and SenseNova 5.5 is touted as a rival to U.S. AI firm OpenAI's GPT-40 in areas such as mathematical reasoning, English proficiency, instruction following and interactive effects.

To be specific, compared with the previous version, SenseNova 5.5’s mathematical reasoning has improved 31.5%, English proficiency has grown 53.8% and its instruction following ability has advanced 26.8%.

GPT-40, which is capable of realistic voice conversation and able to interact across text and image, was introduced by OpenAI in May.

Citing data from OpenCompass, a platform for benchmarking large models, Xu Li, SenseTime co-founder and CEO, said that SenseNova 5.5 has outperformed GPT-40 in five of eight key metrics.

SenseNova 5.5 adopts a hybrid cloud-edge collaborative expert architecture to maximize the “Cloud-to-Edge” synergy and reduce inference costs, according to SenseTime.

The model training was based on more than 10TB tokens of high-quality training data, including a large amount of synthetically-generated reasoning chain data, which help to enhance its reasoning capabilities, the firm said.

On live demonstrations held at the WAIC on Friday, SenseNova 50, which is based on SenseNova 5.5, was capable of achieving real-time, streaming multimodal interactions.

When a SenseTime staff member greeted the AI model, it could read the words on the staff member's badge worn on top of his T-shirt and identify that the staff member was attending the World Artificial Intelligence Conference. The staff then presented a cute puppy toy and SenseNova 50 accurately described the appearance, expressions, and attire of the puppy. The AI model can also summarize the content on a written page, when it is facing it.

OpenAI and Google also did similar demonstrations at their product launches.

SenseTime said in a statement Saturday that SenseNova 50 is the first large AI model in China to realize a new means of human-AI interaction by integrating cross-modal information such as sound, text, images and video in real time. It also provides a new AI interaction model on par with GPT-40’s streaming interaction capabilities.

Chatbots need to have better reasoning abilities, interact more naturally and their ability to generate knowledge needs to be better controlled in order to allow for more AI breakthrough moments, Xu said.

"This is a critical year for large models as they evolve from unimodal to multimodal,” he said. “With applications driving the development of models and their capabilities, coupled with technological advancements in multimodal streaming interactions, we will witness unprecedented transformations in human-AI interactions."

SenseTime has recently rolled out the "Project $0 Go" scheme, offering new users of SenseNova free services such as induction, migration and training as well as 50 million training tokens. It also promises to send consultants to help OpenAI users migrate to SenseNova without charge.

SenseTime, which previously focused on facial recognition technology but has shifted its attention to generative AI following the release of ChatGPT by Microsoft-backed OpenAI in late 2022, is among a growing number of Chinese corporations and startups exploring ways to develop an answer to ChatGPT.

In 2023, it joined Baidu Inc. and Alibaba Group Holding Ltd. in developing its own inhouse generative AI platform by launching the first SenseNova in April 2023. The firm was among the earliest batch of Chinese companies to win government approval to roll out services to the public.

Co-founded 10 years ago in Hong Kong by former Chinese University of Hong Kong professor Tang Xiao’ou, a pioneer in China’s AI industry who died in December last year at the age of 55, SenseTime has undergone rapid growth over the past decade and has been referred to as one of China’s “four little dragons” of AI, along with Cloudwalk Technology, Megvii and Yitu, due to its advanced facial recognition technology.

Hong Kong-listed SenseTime, which has yet to turn a profit, saw its revenue from its generative AI business triple last year to 1.18 billion yuan (US$163.4 million), an increase of 199.9% from a year ago. The firm is now aiming to turn profitable in the next two years as generative AI business is expected to become the company’s new profit engine, said CEO Xu in May.

Xu anticipated more than 100% growth this year for the firm’s generative AI business, which has been deployed in sectors ranging from telecommunications to financial institutions, with clients such as China Merchants Bank, Haitong Securities and China Telecom.