GPU cost is the underlying compute cost of training and running AI models, dominated by Nvidia's H100, H200, B200, and B300 chips at $25,000-$50,000 each. GPU availability and cost are the limiting factor for AI training because foundation model labs need thousands of GPUs running together in clusters. The GPU supply chain is the single largest infrastructure story of the 2020s tech boom. Behind every AI capability is a stack of expensive GPUs running hot.
The Nvidia GPU lineup (mid-2026):
| GPU | Launch | Approximate price | Use case |
|---|---|---|---|
| A100 | 2020 | $10K-$15K | Legacy AI workloads |
| H100 | 2022 | $25K-$40K | Mainstream LLM training/inference |
| H200 | 2024 | $30K-$50K | Improved memory for large models |
| B100/B200 | 2024-2025 | $40K-$60K | Next-gen Blackwell architecture |
| B300 (Blackwell Ultra) | Jan 2026 | $40K-$60K | 288GB HBM3e; DGX B300 system $300-350K |
| GB200 NVL72 (rack-scale) | 2024+ | $3M+ per rack | Full system for largest training |
| Vera Rubin (R100) | sampling Q4 2026 | TBD | Next-gen architecture; HBM4 288GB, 13 TB/s; DGX Rubin rack ~$3.5-4M |
Total Nvidia data center GPU revenue FY2025: ~$130B+. Most went to hyperscalers (Microsoft, Google, Meta, Amazon) and OpenAI/Anthropic.
The training compute requirements:
Frontier model training (GPT-5, Claude 4, Gemini 2):
Mid-size training (specialized models, fine-tuning):
Small training and fine-tuning:
The cloud GPU pricing (2025):
| Provider | Service | H100 hourly | Notes |
|---|---|---|---|
| AWS | P5 instances | $3-$8/hour | On-demand; reserved cheaper |
| Google Cloud | A3 instances | $3-$7/hour | Strong A100/H100 supply |
| Azure | NDv5 | $3-$8/hour | OpenAI's primary partner |
| Specialty (Lambda, Modal, Replicate, Together) | Various | $2-$6/hour | Better availability for startups |
Annual cost: a single H100 running 24/7 = $25K-$70K per year (depending on provider and reservation).
The GPU shortage and supply chain:
2022-2024: severe GPU shortage. Companies waited 6-12 months for orders.
2024-2025: shortage easing as Nvidia ramps production. B200/B300 launch.
Allocation politics: hyperscalers and frontier labs get priority. Smaller startups often access via cloud providers.
Chinese export controls: US restrictions on selling high-end GPUs to China affecting supply dynamics.
Custom silicon: Google TPU, Amazon Trainium, Microsoft Maia all attempting to reduce Nvidia dependence. Apple, Meta also designing custom AI chips.
The startup implications:
Don't build foundation model labs: requires $100M-$1B+ in GPU infrastructure.
Use cloud GPUs: AWS, GCP, Azure, plus specialty providers (Lambda, Modal, Replicate, Together AI, RunPod).
Inference vs training: most startups never train; only run inference (much cheaper).
API vs self-hosted: API access (OpenAI, Anthropic) abstracts away GPU cost; self-hosting requires GPU management.
Cost per query: dominated by GPU compute amortized across users.
The economics evolution:
Per-FLOP cost decline: about 30-50% per year improvement in FLOPs per dollar.
Combined with model efficiency: total cost per useful inference dropping ~10x per 18-24 months.
Inference vs training cost ratio: shifting toward inference as model usage scales. Training is one-time; inference is forever.
The Nvidia moat:
CUDA software ecosystem: ~15 years of CUDA optimization makes switching to other GPUs hard.
Hardware advantage: industry-leading performance per chip and per system.
Network effects: AI talent trained on CUDA; libraries built for CUDA; legacy code on CUDA.
Capital allocation: Nvidia investing $30B+/year in R&D, far ahead of competitors.
GPU cost is the physics underneath your AI economics, and the right move depends on which game you're in. Building an application: use API access and don't touch GPUs. Building AI infrastructure: use specialty providers like Lambda, Modal, or Together AI. Running a foundation-model lab is a different, far more capital-intensive game entirely. The trap is self-hosting production AI with no GPU-ops expertise and praying costs fall fast enough to rescue bad unit economics, so design around today's prices.
What founders get wrong: Underestimating the GPU dimension of AI economics. Inference costs are GPU costs in disguise; understanding the GPU layer helps predict cost trajectory and plan around it. The right discipline: use cloud GPUs/APIs at small scale; understand the GPU supply chain dynamics; design pricing assuming current costs (improvements are a bonus).
Related: Inference Cost · Foundation Model · Training Data · AI Startup
What is GPU cost?
The underlying compute cost of training and running AI models, dominated by Nvidia's data center GPU lineup (H100, H200, B200). Single chips cost $25K-$50K. GPU availability and cost is the limiting factor for AI training and a significant component of inference economics.
How much does an H100 cost?
$25K-$40K to buy. Cloud rental: $3-$8/hour on AWS/GCP/Azure, $2-$6/hour on specialty providers (Lambda, Modal, RunPod). Annual cost running 24/7: $25K-$70K depending on provider.
How many GPUs are needed to train a foundation model?
Frontier models (GPT-5, Claude 4): 10,000-100,000+ GPUs in cluster. Months of training time. $100M-$1B+ in compute alone. Mid-size training: 8-256 GPUs, days-weeks, $10K-$1M. Small fine-tuning: 1-8 GPUs, hours-days, $100-$10K.
Why is Nvidia so dominant?
CUDA software ecosystem (15+ years of optimization), industry-leading hardware performance, network effects (AI talent trained on CUDA, libraries built for CUDA), and $30B+/year R&D investment. Custom alternatives (Google TPU, Amazon Trainium, Microsoft Maia) gaining ground but Nvidia still dominant.
This is just a small sample! Register to unlock our in-depth courses, hundreds of video courses, and a library of playbooks and articles to grow your startup fast. Let us Let us show you!
Submission confirms agreement to our Terms of Service and Privacy Policy.