How do Graphics Cards Work? What is GPU Architecture? Let’s Find Out!

Rakesh Bhardwaj 19th January 2025 in Solution Software Tagged computationalpower, CUDAcores, gaminggraphics, GPUarchitecture, graphicscards, graphicsmemory, MicronMemory, NVIDIA3090, parallelcomputing, raytracing, tensorcores, videogamerendering - 4 Minutes

When you marvel at the realistic graphics of today’s video games, have you ever wondered how many calculations your graphics card performs every second? Is it 100 million? That’s the computational power required to run Mario 64 from 1996. What about 100 billion? That’s enough to run Minecraft in 2011. For ultra-realistic games like Cyberpunk 2077, your graphics card needs to perform around 36 trillion calculations per second. Let’s dive into the fascinating world of graphics cards and explore how they achieve this extraordinary feat.

Conceptualizing 36 Trillion Calculations Per Second

Imagine solving a long multiplication problem once every second. Now picture every person on Earth doing a similar calculation simultaneously. To reach the power of a graphics card capable of 36 trillion calculations per second, you would need 4,400 Earths full of people working together. It’s mind-boggling to think that a small device in your computer can accomplish this by itself.

How Do Graphics Cards Work?

Graphics cards are made up of numerous components working in harmony. Let’s break this down into two parts:

Physical Design and Architecture
Computational Architecture

We’ll also see how GPUs are ideal for running video game graphics, Bitcoin mining, neural networks, and AI.

Graphics Cards vs. CPUs

The core of a graphics card is the Graphics Processing Unit (GPU). High-end GPUs like the NVIDIA 3090 have over 10,000 cores, while a typical CPU only has around 24 cores.

Think of a GPU as a cargo ship and a CPU as a jumbo jet. The cargo ship (GPU) processes massive amounts of data but moves slower. The jumbo jet (CPU) processes smaller amounts of data at a much higher speed. CPUs are versatile and can run operating systems, manage input devices, and more. GPUs, on the other hand, excel at performing parallel computations for graphics rendering but aren’t suitable for general-purpose computing.

The GPU: A Closer Look

At the heart of a GPU lies a chip with billions of transistors. The GA102 chip in a 3090 graphics card, for example, has 28.3 billion transistors and is divided into several components:

Graphics Processing Clusters (GPCs): 7 clusters.
Streaming Multiprocessors (SMs): Each GPC contains 12 SMs.
CUDA Cores: Each SM houses 32 CUDA cores.

With 10,752 CUDA cores, 336 Tensor cores, and 84 Ray Tracing cores, the GPU handles different types of calculations:

CUDA Cores: Perform basic arithmetic operations.
Tensor Cores: Specialized for AI and matrix transformations.
Ray Tracing Cores: Render realistic lighting and shadows.

Defects and Binning in GPUs

Different models of GPUs, like the 3080 and 3090, often use the same core design but differ in the number of working cores. Manufacturing defects can deactivate some cores, leading to variations in performance and pricing.

For example:

The 3090 Ti has all 10,752 CUDA cores active.
The 3090 has 10,496 working cores.
The 3080 Ti and 3080 have progressively fewer working cores due to manufacturing imperfections.

Inside a CUDA Core

A single CUDA core is a simple calculator capable of performing multiply and add operations. Half of the CUDA cores handle 32-bit floating-point operations, while others handle integers. Special Function Units (SFUs) handle more complex tasks like division and square roots.

With a clock speed of 1.7 GHz, a GPU with 10,496 CUDA cores achieves 35.6 trillion calculations per second.

Graphics Memory: The Data Powerhouse

The GPU’s immense computational power requires continuous data input. Here’s how:

GDDR6X Memory: High-end cards like the 3090 use 24 GB of GDDR6X memory.
Bus Width and Bandwidth: Transfers data at a rate of 1.15 TB per second.

Unlike the CPU’s memory bandwidth of 64 GB per second, GPUs demand much faster data throughput to render graphics in real-time.

Advanced Graphics Memory Technology

Modern graphics memory uses sophisticated encoding schemes:

PAM4 Encoding: Used in GDDR6X to represent data with four voltage levels.
PAM3 Encoding: Adopted in GDDR7 for improved efficiency.

Micron, a leader in graphics memory technology, has also pioneered High Bandwidth Memory (HBM), used in AI chips. HBM stacks memory vertically, significantly boosting data transfer rates.

Conclusion

Graphics cards are marvels of modern engineering, enabling the breathtaking visuals we see in today’s video games. From billions of transistors to sophisticated memory technologies, each component plays a crucial role in rendering graphics at incredible speeds. Next time you play your favorite game, take a moment to appreciate the technological wizardry inside your computer.

Tags:

graphics cards, GPU architecture, CUDA cores, gaming graphics, graphics memory, NVIDIA 3090, ray tracing, tensor cores, Micron memory, parallel computing, video game rendering, computational power