DeepSeek's surprisingly inexpensive AI model challenges industry giants. This Chinese startup, a subsidiary of High-Flyer hedge fund, claims to have trained its powerful DeepSeek V3 model for a mere $6 million, utilizing only 2048 GPUs. This contrasts sharply with the reported $100 million cost of training ChatGPT4o. However, the reality is more nuanced.
Image: ensigame.com
DeepSeek V3 leverages innovative technologies: Multi-token Prediction (MTP) for improved accuracy and efficiency, Mixture of Experts (MoE) employing 256 neural networks for enhanced performance, and Multi-head Latent Attention (MLA) to capture crucial details. These advancements contribute to its competitive edge.
Image: ensigame.com
Contrary to the initial $6 million claim, a SemiAnalysis report reveals DeepSeek operates a massive infrastructure of approximately 50,000 Nvidia GPUs (including H800, H100, and H20 units) across multiple data centers, representing a total investment of roughly $1.6 billion and operational costs of $944 million. This includes substantial salaries for researchers, some earning over $1.3 million annually. The $6 million figure only reflects pre-training GPU costs, excluding R&D, refinement, data processing, and infrastructure.
Image: ensigame.com
DeepSeek's success stems from significant investment (over $500 million in AI development), technological breakthroughs, and a highly skilled team. While its lean structure aids innovation, the "budget-friendly" narrative is an oversimplification. The company's self-funding and ownership of its data centers provide significant advantages over cloud-based competitors. Furthermore, DeepSeek's focus on domestic talent (no foreign specialists) is noteworthy.
Image: ensigame.com
Despite the clarified costs, DeepSeek's achievement highlights the potential for well-funded independent AI companies to compete effectively with established players, even if the initial cost claims require further scrutiny. The stark contrast between DeepSeek's overall investment and the initially publicized training cost underscores the complexity of AI development.