SZ-developed AI tool adapts large model to Huawei chip in 38 minutes

Writer: Claudia Wei | Editor: Lin Qiuying | From: Original | Updated: 2026-03-10

A Shenzhen startup has dramatically shortened the time it takes to adapt cutting-edge AI models to domestic chips. Zhizi Xinyuan’s KernelCAT, an AI agent tool, completed the full automated deployment and inference validation of the DeepSeek-OCR-2 model on Huawei’s Ascend platform in just 38 minutes, a feat company engineers say would previously have taken weeks or months of manual work.

KernelCAT parsed the model structure, performed automated troubleshooting, generated a dynamic migration plan and conducted repeated hardware tests without human intervention. The result, the company says, effectively solves a long-standing “adapter” problem: getting large models originally designed for foreign processors to run efficiently on China’s own NPU chips.

Ding Tian, co-founder of Zhizi Xinyuan, speaks during a sharing session at the 27th China Hi-Tech Fair in Shenzhen in November 2025. Courtesy of Zhizi Xinyuan

“AI models are like a complex set of instructions, and different chips speak different languages,” said Ding Tian, research scientist at the Shenzhen Institute of Big Data and co-founder of Zhizi Xinyuan. He compared the old process to trying to run a Windows-only game on a Mac: “You’d have to wait months for special adaptation. Ordinary users couldn’t do it themselves.” Ding added that KernelCAT’s automation makes the same migration possible in the time it takes to finish a cup of coffee.

The technical bottleneck largely centers on operators: the low-level “translators” that convert model operations into chip-native instructions. Historically, engineers hand-wrote operator code line by line and were frequently caught in version conflicts among components such as vLLM, PyTorch and NPU drivers. DeepSeek-OCR-2, a complex multimodal OCR model that incorporates a “visual causal flow” design, imposes particularly high demands on operator quality and version compatibility.

According to Ascend platform data, KernelCAT’s optimizations delivered up to 139 times inference acceleration for the first-generation DeepSeek-OCR model compared with native deployment approaches. That suggests complex OCR models can not only be made to run on domestic compute platforms, but also run stably and at high performance.

Zhizi Xinyuan was incubated by the Shenzhen Institute of Big Data, one of the city’s major basic research institutions. The institute says its “math + AI” approach, combining advanced operations research with modern large-model techniques, helped the team overcome what is called the “last mile” problem of aligning algorithms and operators.

The work has strategic implications. Domestic chips are central to ensuring autonomous computing infrastructure for sensitive fields such as defense and healthcare. Long-term dependence on foreign processors poses supply-chain and security risks. Ding described KernelCAT as both a migration tool and a general compute-acceleration development platform that could help engineers build higher-performance models directly on domestic silicon.

Shenzhen’s broader AI ecosystem — home to more than 2,600 AI companies and supported by city government policies — provides a fertile environment for startups like Zhizi Xinyuan to scale.