DeepSeek V4: A Guide to Cost, Hardware, and the Data Question

Note: This post was written by Claude Opus 4.7. The following is a synthesis of reporting from major news organizations and the model’s published documentation.

China’s DeepSeek released its V4 model line on Friday, April 24, 2026 — twelve months almost to the day after the company’s first model upended the industry by matching frontier US capabilities at a fraction of the cost. V4 ships in two flavors: a smaller, faster V4-Flash and a larger flagship V4-Pro. Both are MIT-licensed and the weights are published on Hugging Face. The company claims V4-Pro rivals the top closed-source models on coding, reasoning, and agentic tasks. Here are the four practical questions readers tend to ask before trying a new AI model — answered for V4.

What it is

V4-Flash is a Mixture-of-Experts model with 284 billion total parameters but only ~13 billion active per forward pass, which is what makes it fast and cheap to serve. V4-Pro is a 1.6-trillion-parameter flagship aimed squarely at OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.7. Both models support long contexts, native tool use, and the same multi-turn conversational interface as their Western competitors. Both are downloadable as open weights from Hugging Face under the MIT license — meaning anyone can run them locally, fine-tune them, or build commercial products on them without permission or royalties.

What it costs

DeepSeek’s hosted API is dramatically cheaper than US competitors. All prices below are per 1 million tokens:

V4-Flash: $0.028 cache-hit input, $0.14 cache-miss input, $0.28 output.
V4-Pro (promotional pricing through May 5, 2026): $0.036 cache-hit input, $0.435 cache-miss input, $0.87 output. Regular pricing after the promo: $0.145 / $1.74 / $3.48 (cache-hit input / cache-miss input / output).

For comparison, GPT-5.5 high-quality output and Claude Opus 4.7 output run at roughly $15–$75 per 1M tokens depending on context size. V4-Pro at promo pricing is therefore one to two orders of magnitude cheaper than the US flagships, and V4-Flash undercuts even Anthropic’s smallest Haiku tier.

DeepSeek also gives new API users 5 million free tokens on signup, and offers free unlimited use through the consumer-facing chat interface at chat.deepseek.com. The chat interface requires an account and stores conversation history.

What it takes to run yourself

Self-hosting is the part that matters most for anyone who needs to keep data off Chinese servers (more on that below). Because the weights are open and MIT-licensed, you have a real choice — but the hardware ask is meaningful.

V4-Flash is the practical option for most self-hosters. According to community deployment guides, the recommended minimum is one NVIDIA H200 (141 GB) GPU, or two A100 80 GB GPUs, plus 256 GB of system RAM and at least 500 GB of NVMe storage. Mixed-precision quantization (FP8 attention with FP4 MoE experts) brings the memory footprint to roughly 158 GB and lets V4-Flash fit on a single H200 node. With aggressive INT4 quantization, early community reports estimate it could fit on four RTX 4090 consumer GPUs — but with measurable quality loss on reasoning tasks.

In rough current pricing, an H200-based node costs $40,000 to $60,000 to build new (the GPU itself runs $30K–$45K, with the rest going to server chassis, RAM, storage, and power supplies); the two-A100 alternative comes in at a comparable total. The four-RTX-4090 INT4 path lands around $10,000 to $12,000 all-in — within reach of a serious enthusiast, with the quality caveats noted. Used-market prices on previous-generation enterprise GPUs can pull these numbers down meaningfully.

V4-Pro is a different proposition. At ~862 GB even after quantization, it requires a real GPU cluster — 16 to 24 H100 or A100 80 GB cards, depending on batch size and KV cache headroom. A 16-GPU H100 80 GB cluster runs roughly $500,000 in hardware before interconnect and chassis; a 24-GPU configuration approaches $800,000 to $1 million all-in.

Operating costs are non-trivial. A cluster like that draws 15–20 kilowatts under sustained load, which is $15,000 to $30,000 a year in electricity at commercial US rates — and cooling typically adds another 30–50% on top of that. Datacenter floor space, racking, and the engineering hours to keep the whole thing running are extra. For most teams, renting V4-Pro inference from a Western cloud provider will be the more practical path than building the cluster.

Standard inference frameworks (vLLM and SGLang) support both models out of the box, including the MoE expert-parallel optimizations needed to keep latency reasonable.

How it compares

DeepSeek’s own benchmarks place V4-Pro near or at the top across coding, math, and reasoning evaluations. Independent reviewers describe its performance as roughly equivalent to GPT-5.4 and Claude Opus 4.6 — both one model generation behind the current US frontier (GPT-5.5 and Opus 4.7), but at a small fraction of the cost. V4-Flash competes more directly with mid-tier models like GPT-5.5 Mini and Claude Haiku 4.5, again at significantly lower cost per token.

Cross-vendor benchmark replication is ongoing. The general read across early reviews is that V4-Pro is a credible top-tier model whose value proposition is price, not absolute capability ceiling. For workloads where “good enough” is good enough, the math is unambiguous.

The data question

This is the question with the most nuance and the most consequence. The short answer depends entirely on how you use V4.

If you use DeepSeek’s hosted API or the chat.deepseek.com interface: your data goes to servers in China. DeepSeek’s privacy policy explicitly states that user data is collected, processed, and stored in the People’s Republic of China. Under China’s 2017 National Intelligence Law, all Chinese companies are required to assist the government with national security matters on request, including providing access to user data. This applies whether the user is in China, the United States, or anywhere else. Italy banned the DeepSeek app in early 2025 citing GDPR violations; regulators in Ireland, Belgium, the Netherlands, and France have opened similar investigations. Earlier security audits found a publicly accessible DeepSeek database containing more than a million chat-history records and API secrets, and code in the consumer app that transmitted user telemetry to a state-controlled Chinese telecom registry.

If you self-host the open weights: none of the above applies. The MIT license and open weights mean the model runs entirely on hardware you control. No telemetry leaves your network. No data goes to China. This is the path for anyone with sensitive workloads — protected health information, regulated financial data, internal company strategy — who wants to use V4 at all.

If you use a Western cloud provider’s hosted V4 endpoint (Together, Fireworks, AWS Bedrock if it picks up the model, etc.): your data goes to that provider, governed by their terms and the laws of their jurisdiction, not to China. This is the practical middle path for teams that want V4-Pro’s capability without buying a 24-GPU cluster.

The headline answer — “your data goes to China” — is true for the first option only. The full answer requires picking which deployment you mean.

Bottom line

V4 makes top-tier-adjacent capability available at the cost of a small infrastructure bill or, with a self-host, the cost of the GPUs. The pricing is a real shock to the rest of the market, and it will pressure US labs the same way the original DeepSeek release did a year ago. The data-sovereignty answer is solvable for users who care, because the model is genuinely open. For users who don’t self-host, the question of whose servers are processing their prompts is real and worth treating seriously.