William “Chip” Usher is the Senior Director for Intelligence at the Special Competitive Studies Project. Prior to SCSP, Chip served 32 years in the Central Intelligence Agency where he held a variety of executive positions. Chip is a former member of the Senior Intelligence Service and has expertise on East Asia, the Near East, and Eurasia.
EXPERT Q&A — The sudden and rapid success of China’s DeepSeek, and its new artificial intelligence (AI) assistant, has caused a storm in the U.S. tech and business communities. The Chinese AI startup – and many outside experts – say the model can compete with those produced by top American AI leaders, and at a fraction of the cost.
The breakthrough sparked investigations by OpenAI and Microsoft into whether DeepSeek had copied information from their models to develop its own. There is also the question of how DeepSeek developed the model amid U.S. AI chip export controls aimed at curbing China’s AI industry. And whatever the answers are to those questions, there is the concern that DeepSeek’s success may carry national security concerns as well.
The Cipher Brief turned to Chip Usher, Senior Director for Intelligence at the Special Competitive Studies Project, to understand exactly how DeepSeek pulled off its AI coup – and what the long-term implications may be in the tech and national security realms.
Usher spoke with Cipher Brief CEO Suzanne Kelly. Their conversation has been edited for length and clarity. You can also watch their full discussion on our YouTube channel.
Kelly: You were one of the first people who I thought of when I heard this news about DeepSeek. Could you help us unpack it a little bit?
Usher: What occurred, really going back to around Christmas time, is this Chinese AI lab called DeepSeek revealed and published papers about three models that it had created. The first was called V3, and the most recent one is called R1. In the case of R1, which is the latest one that they released, its performance is comparable to OpenAI’s most recent model, O1, and it exceeds the performance of Lama, which is Meta’s AI model. That in and of itself is significant.
On top of that, it’s how they went about it — the way that they created their model was faster and cheaper than what American companies have done. That is what has the markets riled up. It’s what has policymakers in Washington and around the world looking at this very seriously because, as we have read over the last year, as we’ve been watching OpenAI and Meta and Anthropic release their models and create more and more of these data centers to build and train their models with, we’re hearing figures of hundreds of billions of dollars. These frontier models are consuming vast amounts of electricity and they consume vast numbers of NVIDIA’s most advanced GPU chips.
In the case of DeepSeek, what they managed to accomplish with V3, which they announced around Christmas time, is they trained it in just two months. And they trained it on GPUs that are known as H800s — these are Nvidia chips that were exported legally to China in the wake of the October 2022 export ban announced by the Biden administration. That ban really targeted one of the most advanced chips that Nvidia makes, the H100s, but it didn’t preclude the sale of H800s. So, it’s very interesting that this Chinese company was able to squeeze an enormous amount of performance out of inferior chips — and fewer of them. To give you a comparison, the published reports I’ve seen indicate that they used about 50,000 H800 chips. The more recent OpenAI efforts used about 500,000 H100 chips. A vast difference in compute power.
The intersection of technology, defense, space and intelligence is critical to future U.S. national security. Join The Cipher Brief on June 5th and 6th in Austin, Texas for the NatSecEDGE conference. Find out how to get an invitation to this invite-only event at natsecedge.com
DeepSeek was able to do this in significantly less time because they put together some techniques that AI labs have been developing and are using, but in some novel and unique ways. The first thing that they did is they used a technique called mixture of experts, or MOE.
Think about a company like General Electric (GE), which makes all sorts of things — dishwashers, aircraft engines, radars. If you’re an outsider going to GE and you knock on the door and the receptionist greets you, imagine if every time there was a query, whether it’s “I need to buy a jet engine” or “I need to sell you paperclips,” that decision went through the entire GE infrastructure before an answer was delivered. That’s a roughly analogous way as to how OpenAI and some of our closed model systems operate on brute computing power.
This other technique, which is not unknown to American companies, but what DeepSeek really seems to have perfected here, is called mixture of experts. So when you walk into that reception room at GE and you say, “I want to buy a jet engine,” it immediately directs you to the expert that’s going to deliver you the jet engine that you require. Our companies have been developing techniques to do this, but it seems that DeepSeek really refined that process. It speeds up the response and it reduces the amount of compute and energy that’s required to deliver an answer.
The other method DeepSeek used is called reinforcement learning. The typical AI models that we are used to playing with — ChatGPT and O1 and Sonnet 3.5. — are trained using reinforcement learning, but it’s sort of akin to how you would train your pet dog. You ask it to do a trick, when it performs that trick, you give it a reward. That early approach of blunt reinforcement learning was very effective and in fact was the underpinnings of what DeepMind accomplished when it beat the best Go player several years back; AlphaGo was a reinforcement learning approach.
In the time since AlphaGo, there are new techniques where it’s not just providing a reward and eventually your pet dog is going to learn how to do the trick, but it’s offering rewards and then offering which experts might be useful to iterate to get the better response. It’s proven to be a little bit more of an efficient and faster way to generate learning and training for an AI model. That’s another approach that DeepSeek used.
It would appear there have been accusations by OpenAI, backed up by Microsoft, in the days since DeepSeek made its announcement, that they may have in fact used OpenAI 01 to train their model. It’s a process called distillation. In fact, our companies use it all the time when they release a model. Then they release a kind of smaller follow-on model that’s cheaper, but it’s consumer facing. They often train that model from their bigger, larger model. It appears that DeepSeek may have done that in this instance, which got their model up to speed quicker. So in a way, they’re kind of free riding on the work that OpenAI and others may have pioneered.
This is not to denigrate the accomplishment, but it does suggest that the amount of compute that DeepSeek and other Chinese companies will have available to them in the future will still matter.
Everyone needs a good nightcap. Ours happens to come in the form of a M-F newsletter that provides the best way to unwind while staying up to speed on national security. (And this Nightcap promises no hangover or weight gain.) Sign up today.
Kelly: Should this be a wake-up call for national security professionals here?
Usher: We have entered a period of intensified innovation competition, primarily between the United States and the People’s Republic of China, but not those two actors alone. This is emblematic of the sort of thing that we’re going to face more and more frequently in the years ahead.
And it’s not the first time. We don’t have to go that far back to other recent examples of Chinese breakthroughs that shocked the marketplace and shocked the national security community. Think back to 2023, when Huawei released their Mate 60 Pro smartphone, timed perfectly to when our commerce secretary was paying a visit to Beijing. And the performance of that smartphone was remarkably good, though it has since proven not to be as good as our best smartphones. But it certainly was a strong showing and it shocked people.
DeepSeek is similar. I think the upshot is, it shows how quickly China can close the gap if a gap exists. And that ought to be concerning to the United States, not just for commercial and competitive reasons, but also for national security reasons.
Kelly: What does this mean, do you think, to the intelligence community in particular?
Usher: Well, you know, here at SCSP (the Special Competitive Studies Project), we have been sounding the alarm that we as an intelligence community need to devote more time and attention and resources to what we have coined techno-economic intelligence. This is nothing especially new to the IC — it’s been done for years — but it needs more attention and more resources so that policymakers down the line are not caught by surprise by events like this.
I’m not privy to the current intelligence reporting, but judging by what we’ve seen in the open source with regard to the market’s reaction, and policymakers’ reaction, I would hazard a guess that our intelligence community did not identify DeepSeek as having been on the threshold of a major AI breakthrough. Where we would like to see the IC land here in the not too distant future, is to do exactly that sort of thing. So it needs to shift how it collects against techno-economic issues like this, who it collects against, and how it goes about analyzing and assessing that data.
DeepSeek’s code is openly available for others to unpack and understand, unlike OpenAI and Anthropic and some of ours, which are closed-source. This is a real boon to hostile cyber and disinformation actors around the globe. China, Iran, North Korea – these actors are now going to have available to them at very little cost, with very little effort, access to a very capable AI model, and they can iterate on that to develop specialized models to conduct cyber and disinformation ops at scale. And it’s already happening – I saw a recent report that there are literally dozens of cyber actors inside China that already have been using Google’s Gemini AI to write malicious code and to search for network vulnerabilities in foreign countries. Well, now with DeepSeek’s model, this capability is going to be diffused even further, and faster. So we need to guard up against AI-enabled cyber and disinformation ops, even more than before.
And the last point I would make is that the accusation that DeepSeek may have learned or benefited from U.S.-developed models brings up the point that we need to be able to defend national security AIs when we create them. We cannot have Chinese users logging on and cheating ahead, to train their models based on national security U.S. models. It absolutely raises the importance of defending our systems.
Read more expert-driven national security insights, perspective and analysis in The Cipher Brief
EXPERT INTERVIEW — World leaders and tech executives are gathered in Paris for the latest global summit on artificial intelligence. The French AI summit, co-hosted by […]More
BOTTOM LINE UP FRONT — When word first came last week that China’s AI startup DeepSeek had launched an artificial intelligence (AI) assistant that could compete […]More
BOTTOM LINE UP FRONT — The U.S. is facing an onslaught from adversaries in cyberspace, and while conversations about the response has focused on bolstering cybersecurity […]More
EXPERT INTERVIEWS — Does Chinese ownership of the wildly popular TikTok app pose a national security risk to the United States? And if so, what should […]More
EXPERT INTERVIEW — The U.S. starts the new year with a daunting set of challenges in the national security space – from global conflicts to terrorism […]More