Chasing Primes

A 25-Year Compute Journey Measured by the Sieve of Eratosthenes

The Programming Rite of Passage

Counting prime numbers up to a given limit is a classic rite of passage for new programmers. It’s often assigned early in a course because it naturally introduces key concepts such as variable types, loops, arrays, procedures, and basic algorithm optimization.

A prime number is a whole number greater than 1 that can be divided evenly only by 1 and itself. For example, 7 is prime because its only divisors are 1 and 7, while 15 is not prime because it can be divided evenly by 1, 3, 5, and 15.

Most beginners approach prime detection by checking divisibility and examining remainders. While this method works, and can be improved with smarter logic, it is not the most efficient for large ranges.

An elegant alternative dates back thousands of years to the 3rd century BCE. The mathematician Eratosthenes of Cyrene developed a remarkably efficient technique now known as the Sieve of Eratosthenes. This algorithm is frequently introduced alongside prime-counting assignments because it naturally involves array manipulation and demonstrates how algorithmic thinking can dramatically improve performance.

To see how this timeless mathematical problem scales across physical technology, we compiled and executed a native Prime Counting C Application on computer environments spanning a quarter century—tracking how hardware evolution alters pure computational speed.

A Simple Explanation

Goal: Find all prime numbers up to a chosen limit.

  1. Write down all numbers from 2 up to your limit.
  2. Start with the first number, 2. Since it hasn't been crossed out, it's prime.
  3. Cross out all multiples of 2 (4, 6, 8, 10, ...), because they can be divided by 2.
  4. Move to the next uncrossed number, 3. It's prime.
  5. Cross out all multiples of 3 (6, 9, 12, 15, ...).
  6. Repeat: each time you find an uncrossed number, keep it and cross out its multiples.
  7. When you're done, every number that remains uncrossed is prime.

Because this sieve relies heavily on allocating a massive array and iteratively jumping through memory to cross off flags, it serves as an exceptional modern benchmark not just for pure raw processor clock speeds, but for memory latency and CPU cache capacity.

Test Your Own Device

How does your current machine handle the Sieve? Select a calculation threshold, click the button below, and see your browser's speed map directly into the live chart and data table!


Visualizing the Gap: 100M Limit Performance

To compare all systems fairly, we mapped out the execution times for calculation targets up to 100 Million. Run the 100M benchmark above to see your device plot instantly against legacy and cutting-edge desktop silicon.

The Raw Benchmark Data

The processing times recorded below showcase raw calculation speeds (measured in milliseconds). Lower numbers indicate faster performance.

Hardware / OS Environment 1M (ms) 10M (ms) 100M (ms) 1B (ms)
Your Device (This Browser) N/A (Browser Limit)
AMD Ryzen 7 9800X3D (Win 11, 64-bit native)1111673,439
AMD Ryzen 7 9800X3D (Linux Mint, VB VM)182084,107
AMD Ryzen 7 9800X3D (Win XP, VB VM 32-bit)1.7612.495,299
MacBook Air 2018 (macOS Sonoma)6.3462.087979,502
MacBook Air 2017 (Linux Mint)10.369.6994911,792
MacBook Pro Early 2015 (Win 11, 64-bit)5.2791.381,13013,170
MacBook Pro Early 2015 (macOS Monterey)4.2474.081,22114,629
Acer Aspire One Netbook (WinXP, 1.6GHz Atom)42.44639.007,386
Dell Inspiron 5000e (WinXP, PIII 700MHz)143.131,905.4524,080
Dell Inspiron 5000e (WinXP, PIII 550MHz)152.371,977.7825,498

Key Takeaways & Engineering Insights

1. The Generational Leap (150x Improvement)

Looking closely at the 100 Million calculation block, the year-2000 Dell 5000e running an Intel Pentium III at 550MHz sputtered through the operation in a sluggish 25.49 seconds. Flash forward to the AMD Ryzen 7 9800X3D: it chews through the exact same logic in an invisible 167 milliseconds. That is roughly a 152x speedup, highlighting the massive cumulative effect of frequency improvements, branch prediction refinements, and Instructions Per Clock (IPC) gains over 25 years.

2. The Magic of Large CPU Caching (3D V-Cache)

A classic Sieve of Eratosthenes requires a large chunk of sequential memory array flags. When calculating up to 10 Million, the memory requirement easily fits entirely within the Ryzen 7 9800X3D's massive 96MB of L3 3D V-Cache. Because the CPU doesn't have to continuously fetch data from system RAM, its processing speeds fall to a jaw-dropping 8 to 11 milliseconds.

3. The Severe Tax of Virtualization & Emulation

Look at the Ryzen 9800X3D's behavior inside VirtualBox. Running modern Linux Mint introduces only a minor performance degradation (from 167ms to 208ms at 100M). However, spinning up a 32-bit legacy Windows XP Virtual Machine sends the calculation time skyrocketing to 5,299 milliseconds! Forcing modern Zen 5 architectures to context-switch through legacy 32-bit translations inside an unoptimized hypervisor box destroys execution efficiency.

4. OS Ecosystem Variations

On the Intel Core i5 inside the Early 2015 MacBook Pro, running a native application on Windows 11 surprisingly squeaked past macOS Monterey when tasks scaled up to 1 Billion numbers (13,170ms vs 14,629ms). This highlights how variations in compiler optimizations (GCC/Clang vs MSVC) handle deeply nested looping matrices differently.

Hardware Timeline Map

2000
Dell Inspiron 5000e
Windows XP (700MHz P3 Static) | 100M = 24,080 ms

Mobile Pentium III (Coppermine architecture), leveraging SDR/early DDR laptop memory speeds.

2009
Acer Aspire One D150
Windows XP (1.6GHz Atom Static) | 100M = 7,386 ms

Intel Atom N270 Netbook era. Low-power, in-order execution processor designed for portability over performance.

2015 - 2018
Intel Broadwell & Amber Lake Era
macOS Sonoma (MacBook Air 2018 i5-8210Y @ 1.6GHz Base / 3.6GHz Turbo) | 100M = 797 ms

MacBook Pro & Air architectures utilizing multi-core ultra-low voltage Intel core profiles with high turbos.

Modern Era
AMD Ryzen 7 9800X3D
Windows 11 (64-bit Native @ 4.7GHz Base / 5.2GHz Boost) | 100M = 167 ms

State-of-the-art TSMC 4nm processing node packing stacked L3 V-Cache layout operating at blistering IPC throughput.