
i built a mandelbrot set renderer in cuda to see how much faster gpu parallelism is compared to cpu-based approaches like pthreads and openmp. the answer: a lot
the numbers
| method | time (seconds) |
|---|---|
| cuda (gpu) | 0.125 |
| openmp (cpu) | 0.221 |
| pthreads (cpu) | 0.643 |
cuda comes in at ~1.8x faster than openmp and ~5x faster than pthreads — and this is on a relatively simple fractal. the gap only grows with more complex computations
why gpus are good at this
the mandelbrot set is an embarrassingly parallel problem. every pixel is independent — you just iterate z = z² + c until it escapes or hits the max iteration count. no pixel depends on any other pixel
cpus are great at complex, branching, sequential logic. but when you have 2 million pixels that all need the same computation? that's what gpus were designed for
how it works
the kernel launches a 60×34 grid of 32×32 thread blocks — enough to cover every pixel in a 1920×1080 image. each thread:
- maps its
(x, y)position to a point on the complex plane (real: [-2.0, 1.0], imaginary: [-0.85, 0.8375]) - iterates z = z² + c up to 1000 times
- bails out early if |z| > 2.0 (the point has escaped)
- uses logarithmic smoothing on the escape time to avoid banding artifacts
- maps the smoothed value to an rgb color using a polynomial gradient
the coloring is the fun part — a (1-t)^n * t polynomial for each channel creates that smooth blue-to-gold-to-dark gradient you see in the image
what i learned
- stb_image_write is great for quick image output in c without pulling in a massive library
- cuda's
__device__functions make it easy to break gpu code into clean helper functions - learning actually how a gpu is faster than a cpu was pretty cool — understanding that gpu's are better for same operation in parallel across threads is cool
- understanding what
cuda_device_synchronizedoes helped me make connections between regular c and cuda or cpu to gpu
what to improve
- someone contacted me on linkedin about using a custom meta library called delresto for optimized cpu performance — it's been a while since the og post, but it's something i want to work on and get back to contacting this researcher i shall leave unnamed
the full source is on github if you want to poke around or run it yourself