Writing Programs for GPUs and CPUs

September 2012

This post is a draft. Content may be incomplete or missing.

There are many tutorials online for 3D graphics APIs like DirectX and OpenGL. Most start with a discussion of the standard graphics pipeline, followed by a deep dive into function calls and sample code. But what’s really going on under the hood? The answer will help you write better, faster rendering code.

The Graphics Device

Although computer graphics began with software-based rasterization, the idea of using dedicated hardware to render graphics has become ubiquitious in computing today. To understand why, we’ll look at what makes graphics hardware special.

The graphics card is an add-in card you can connect to your PC’s motherboard. Despite appearances, however, graphics cards are quite complex. These devices are miniature computers themselves, complete with processors (called GPUs) and onboard RAM.

A GPU can be thought of as a highly parallel CPU. Both GPUs and CPUs perform similar tasks (they both execute programs), but they do so very differently:

  • A CPU consists of a handful of complex cores that run at high speeds (around 2-3 GHz). Given a parallelizable dataset (one with many elements that can be processed in any order), a CPU can only handle a few elements at once. It moves through the data set sequentially.

    This is an example of a MIMD (pronounced “mim-dee”) architecture. MIMD stands for “Multiple Instruction / Multiple Data.” On a MIMD architecture, one must execute multiple instructions in sequence to process multiple data elements.

  • GPUs use much simpler cores running at lower speeds (.5-1 GHz). They make up for this simplicity by packing many cores together (10s, 100s, or sometimes 1000s).

    This system is SIMD: Single Instruction, Multiple Data. GPUs excel at processing data sets in which each data element can be procesed on its own.

Although a single CPU core can process a single data element faster than a single GPU core, GPUs can eat through large datasets much faster than CPUs due to the sheer number of cores in a GPU. GPUs are a great fit for computer graphics, since many graphics algorithms are embarassingly parallel.

For example, in many cases every pixel in an image can be procedued independently of any other pixel. A GPU can then fill many pixels at once, while a CPU is stuck filling them in sequentially, a handful at a time.

Splitting the Work

While GPUs are great for graphics tasks, their simplicity makes them inconvenient for general-purpose programming. Because of this, most graphics-intensive programs split their logic between the GPU and the CPU.


  • CPU manages app logic, GPU is a worker.
  • CPU does more general computation, GPU handles parallel subtasks
  • CPU sends commands to the GPU, giving it work items
  • OpenGL is just a specification that GPUs and CPUs implement.
  • It lets CPUs and GPUs talk to each other
  • Ramifications: Synchronization
  • Future ramifications / the age of parallelism