NVIDIA: Reduce The Cost Of CPU-Training An LLM From $10 Million To Just $400,000 USD By Buying Our GPUs

NVIDIA has taken quite a few potshots at the entire CPU industry at Computex 2023. Jensen took the stage in the first ever live show after 4 years and boldly (and quite correctly) declared generative AI and accelerated computing the future of computing. Reading an eulogy to the conventional wisdom of Moore's Law he declared that the time where you could get a 10x speed up in 5 years while keeping the power and the cost same are over. In the future, most of the speedups will come from generative AI and accelerated computing based approaches. He also shared an absolutely lovely TCO with the audience:

NVIDIA presents a Large Language Model (LLM) TCO analysis at Comptuex:

Let's start with the baseline first. A 960 CPU based $10 million servers is needed to train 1 LLM (large language model). To be clear, NVIDIA calculated the complete cost of the server cluster needed to train a single large language model (including networking, casing, interconnects - everything) and found that it took roughly $10 million USD and a power consumption of 11 GWh to train  a single large language model.

On the other hand if you keep the cost same and buy a $10 million GPU cluster, you can train 44 large language models in the same cost and a fraction of the power cost (3.2 GWh). This scenario is called ISO cost in a TCO analysis (keeping the sunk cost the same).

If you shift instead to ISO power or keeping the power consumption the same, then you can actually achieve a 150x speedup by training 150 LLMs in the same power consumption of 11 GWh at a cost $34 million USD. The footprint of this cluster would still be significantly less than the CPU cluster.

Finally, if you wanted to keep the workload exactly the same, then you would just need a $400,000 USD GPU server consuming 0.13 GWh to train a single LLM. Essentially what NVIDIA is saying that you can train an LLM in just 4% of the cost and just 1.2% of the power consumption - which is a massive reduction when compared to CPU-based servers.

Written by Usman Pirzada

Post a Comment

0 Comments