DeepSeek: The low-profile Chinese AI startup shaking up the LLM price war
1/70th the cost of GPT-4, on par performance: how did this dark horse rise?
For years, China has been seen as a master of imitation—not of groundbreaking innovation, but of clever copycats and iterations. From apps to algorithms, the country’s reputation for creating knock-offs has dominated global conversations. But there’s a twist. What if China could do more than just replicate?
In recent years, the chip embargo imposed by the US has intensified the difficulty for Chinese companies in breaking boundaries in AI development. But it is under these exact circumstances that a company called DeepSeek has risen out of nowhere—defying all expectations.
Founded in May 2023 by a team backed by China’s quant hedge fund giant, High-Flyer, DeepSeek’s latest open-sourced LLM, DeepSeek V3, now competes head-to-head with the likes of GPT-4 4o, Llama 3.1, and Claude 3.5. But here’s the kicker: this astonishing progress was made with H800 chips, done within less than two years, and still managed to significantly reduce costs to 1/7 of the Llama 3 70B and 1/70 of the Gpt-4 turbo. The team achieved this by innovating the model’s architecture. In terms of math and coding capabilities, it even outperformed the hottest models currently coming out of Silicon Valley.
The founder, Liang Wenfeng, has a refreshingly candid and almost geeky style, brimming with an unwavering passion for technological innovation. Unlike other major tech companies in China—such as Tencent, Alibaba, and ByteDance—that are also developing LLMs, Liang's team stands out for its sole focus on the model itself, rather than pursuing both the model and applications simultaneously. Despite not prioritizing commercialization or monetization, the team has managed to remain profitable.
No fundraising, no flash—just results. How did this dark horse come to be?
More importantly, while the DeepSeek team has achieved impressive milestones, they’re still not yet the world’s top performer. But the impressive speed of development has raised some important questions for the tech and investing community to reconsider China’s role in the AI race. Does it signal inefficiency and bubbles in the current LLM market? Or could this be the beginning of a new chapter in Chinese innovation—one where it’s not just about copying, but truly creating?
In today’s post, we translate an exclusive interview between Ling Wenfeng, DeepSeek’s founder, and “Waves,” a news outlet by 36Kr (NASDAQ: KRKR). The interview offers insights into not just how the team achieved such impressive results, but also the philosophy behind the company's founding and their vision for China's technological landscape in the future. We will also upload a video comparing DeepSeek with OpenAI's model in the upcoming week on our newest YouTube channel. Stay tuned.
Below is the translation of the original post by the Waves.
DeepSeek Unveiled: A Story of Extreme Technological Idealism from China
Among the seven major AI model startups in China, DeepSeek (深度求索) is the quietest, yet it always manages to make an impression in surprising ways.
A year ago, the surprise came from High-Flyer, the quantitative private equity giant that backed DeepSeek and was the only non-tech giant with a stockpile of over 10,000 A100 chips. A year later, the surprise came from the fact that DeepSeek was the catalyst for China's AI model price war.
In the bombarded AI landscape of May, DeepSeek skyrocketed to fame after the release of its open-source model, DeepSeek V2, which offered an unprecedented cost-performance ratio. The inference cost was reduced to just 1 RMB per million tokens, roughly one-seventh of Llama3 70B and one-seventieth of GPT-4 Turbo.
DeepSeek quickly earned the nickname “the Pinduoduo of AI,” and soon, giants like ByteDance, Tencent, Baidu, and Alibaba followed suit, lowering their prices. This sparked China’s AI price war.
Yet, beneath the smoky battleground, a key fact often gets overlooked: Unlike many tech giants burning money on subsidies, DeepSeek is profitable.
This success stems from DeepSeek’s extensive innovation in model architecture. The company introduced a new MLA (Multi-Head Latent Attention) architecture that reduced memory usage to just 5%-13% of the widely used MHA architecture. Additionally, its proprietary DeepSeekMoESparse structure minimized computation needs to the extreme, ultimately driving down costs.
In Silicon Valley, DeepSeek is dubbed “the mysterious force from the East.” SemiAnalysis’ chief analyst considers the DeepSeek V2 paper “perhaps the best of the year.” Andrew Carr, a former OpenAI employee, called the paper “full of amazing wisdom” and applied its training settings to his own models. Jack Clark, former policy director at OpenAI and co-founder of Anthropic, stated that DeepSeek “employed a group of incredibly talented minds,” and believes that Chinese-made large models “will become as powerful as drones and electric vehicles, impossible to ignore.”
This level of recognition is rare in an AI field dominated by Silicon Valley. Several industry insiders told us that the strong reaction was due to the groundbreaking nature of DeepSeek’s innovation at the architectural level, something that is rare even among global open-source AI models. An AI researcher noted that after the introduction of the Attention architecture, very few people dared to innovate it, let alone validate it on a large scale. "This was something that would be shut down in decision-making because most people lacked confidence."
On the other hand, Chinese AI companies have rarely ventured into architectural innovation, partly due to the prevailing belief that the U.S. excels at pioneering 0-1 technological breakthroughs, while China specializes in 1-100 application innovations. Moreover, the returns on such innovation seem too distant—new-generation models will inevitably emerge in a few months, and Chinese companies can simply follow suit and excel in application. Innovating model structures means charting an unknown path with lots of potential failures, costing both time and money.
DeepSeek is clearly a contrarian. Amidst a chorus of voices that believe model technologies will inevitably converge and that following others is the smarter, faster path, DeepSeek values the lessons learned in the “detours” and believes Chinese model innovators, in addition to applying innovation, can also contribute to the global wave of technological breakthroughs.
Many of DeepSeek’s decisions stand apart from the crowd. To date, it is the only startup among China’s seven major AI companies that has chosen to focus solely on research and technology, rejecting the “both ways” route. It has not pursued to-C applications, nor has it fully embraced commercialization or raised any funds. This has often led to DeepSeek being overlooked by the mainstream, yet on the other hand, it is frequently spread by word of mouth in the AI community.
So, how did DeepSeek come to be? To answer this, we interviewed the reclusive founder, Liang Wenfeng.
Liang, an 80s-born founder who has been behind the scenes since his days at High-Flyer, continues his low-key style at DeepSeek. Like all of his researchers, he spends his days “reading papers, writing code, and participating in group discussions.” Unlike many quantitative fund founders who have experience in overseas hedge funds and backgrounds in physics or mathematics, Liang is a homegrown talent, having studied AI in the Department of Electronic Engineering at Zhejiang University.
Industry insiders and DeepSeek researchers tell us that Liang is an exceptionally rare figure in China’s AI scene: someone who combines strong infrastructure engineering skills with expertise in model research and the ability to marshal resources. “He can make precise top-level judgments and is stronger than frontline researchers in the details,” they said. “His learning ability is terrifying,” while he “doesn’t come across as a boss, but rather as a geek.”
This is a rare interview. In it, this technological idealist offers a voice that is particularly scarce in China’s tech community: he is one of the few who prioritize “principles over profits” and urges us to recognize the inertia of the times (baiguan: which refers to the prevailing belief that China was historically better at manufacturing rather than original innovation), pushing for "original innovation" to take center stage.
A year ago, when DeepSeek first launched, we interviewed Liang Wenfeng in “The Madness of High-Flyer: The Path of a Hidden AI Giant’s Large Model.” If that quote from a year ago, “We must embrace ambition with madness, and also embrace sincerity with madness,” still sounded like a beautiful slogan, it has now become a call to action.
Below is the transcription of the interview, with journalists from the "Waves" interviewing Liang Wenfeng, the founder of DeepSeek.