Chasing Primes: A 25-Year CPU Architecture Showdown

The Programming Rite of Passage

Counting prime numbers up to a given limit is a classic rite of passage for new programmers. It’s often assigned early in a course because it naturally introduces key concepts such as variable types, loops, arrays, procedures, and basic algorithm optimization.

A prime number is a whole number greater than 1 that can be divided evenly only by 1 and itself. For example, 7 is prime because its only divisors are 1 and 7, while 15 is not prime because it can be divided evenly by 1, 3, 5, and 15.

Most beginners approach prime detection by checking divisibility and examining remainders. While this method works, and can be improved with smarter logic, it is not the most efficient for large ranges.

An elegant alternative dates back thousands of years to the 3rd century BCE. The mathematician Eratosthenes of Cyrene developed a remarkably efficient technique now known as the Sieve of Eratosthenes. This algorithm is frequently introduced alongside prime-counting assignments because it naturally involves array manipulation and demonstrates how algorithmic thinking can dramatically improve performance.

To see how this timeless mathematical problem scales across physical technology, we compiled and executed a native Prime Counting C Application on computer environments spanning a quarter century—tracking how hardware evolution alters pure computational speed.

A Simple Explanation

Goal: Find all prime numbers up to a chosen limit.

Write down all numbers from 2 up to your limit.
Start with the first number, 2. Since it hasn't been crossed out, it's prime.
Cross out all multiples of 2 (4, 6, 8, 10, ...), because they can be divided by 2.
Move to the next uncrossed number, 3. It's prime.
Cross out all multiples of 3 (6, 9, 12, 15, ...).
Repeat: each time you find an uncrossed number, keep it and cross out its multiples.
When you're done, every number that remains uncrossed is prime.

Because this sieve relies heavily on allocating a massive array and iteratively jumping through memory to cross off flags, it serves as an exceptional modern benchmark not just for pure raw processor clock speeds, but for memory latency and CPU cache capacity.

Test Your Own Device

How does your current machine handle the Sieve? Select a calculation threshold, click the button below, and see your browser's speed map directly into the live chart and data table!

Select Upper Limit:

Visualizing the Gap: 100M Limit Performance

To compare all systems fairly, we mapped out the execution times for calculation targets up to 100 Million. Run the 100M benchmark above to see your device plot instantly against legacy and cutting-edge desktop silicon.

The Raw Benchmark Data

The processing times recorded below showcase raw calculation speeds (measured in milliseconds). Lower numbers indicate faster performance.

Hardware / OS Environment	1M (ms)	10M (ms)	100M (ms)	1B (ms)
Your Device (This Browser)	—	—	—	N/A (Browser Limit)
AMD Ryzen 7 9800X3D (Win 11, 64-bit native)	1	11	167	3,439
AMD Ryzen 7 9800X3D (Linux Mint, VB VM)	1	8	208	4,107
AMD Ryzen 7 9800X3D (Win XP, VB VM 32-bit)	1.76	12.49	5,299	—
MacBook Air 2018 (macOS Sonoma)	6.34	62.08	797	9,502
MacBook Air 2017 (Linux Mint)	10.3	69.69	949	11,792
MacBook Pro Early 2015 (Win 11, 64-bit)	5.27	91.38	1,130	13,170
MacBook Pro Early 2015 (macOS Monterey)	4.24	74.08	1,221	14,629
Acer Aspire One Netbook (WinXP, 1.6GHz Atom)	42.44	639.00	7,386	—
Dell Inspiron 5000e (WinXP, PIII 700MHz)	143.13	1,905.45	24,080	—
Dell Inspiron 5000e (WinXP, PIII 550MHz)	152.37	1,977.78	25,498	—

Key Takeaways & Engineering Insights

1. The Generational Leap (150x Improvement)

Looking closely at the 100 Million calculation block, the year-2000 Dell 5000e running an Intel Pentium III at 550MHz sputtered through the operation in a sluggish 25.49 seconds. Flash forward to the AMD Ryzen 7 9800X3D: it chews through the exact same logic in an invisible 167 milliseconds. That is roughly a 152x speedup, highlighting the massive cumulative effect of frequency improvements, branch prediction refinements, and Instructions Per Clock (IPC) gains over 25 years.

2. The Magic of Large CPU Caching (3D V-Cache)

A classic Sieve of Eratosthenes requires a large chunk of sequential memory array flags. When calculating up to 10 Million, the memory requirement easily fits entirely within the Ryzen 7 9800X3D's massive 96MB of L3 3D V-Cache. Because the CPU doesn't have to continuously fetch data from system RAM, its processing speeds fall to a jaw-dropping 8 to 11 milliseconds.

3. The Severe Tax of Virtualization & Emulation

Look at the Ryzen 9800X3D's behavior inside VirtualBox. Running modern Linux Mint introduces only a minor performance degradation (from 167ms to 208ms at 100M). However, spinning up a 32-bit legacy Windows XP Virtual Machine sends the calculation time skyrocketing to 5,299 milliseconds! Forcing modern Zen 5 architectures to context-switch through legacy 32-bit translations inside an unoptimized hypervisor box destroys execution efficiency.

4. OS Ecosystem Variations

On the Intel Core i5 inside the Early 2015 MacBook Pro, running a native application on Windows 11 surprisingly squeaked past macOS Monterey when tasks scaled up to 1 Billion numbers (13,170ms vs 14,629ms). This highlights how variations in compiler optimizations (GCC/Clang vs MSVC) handle deeply nested looping matrices differently.

Hardware Timeline Map

2000

Dell Inspiron 5000e

Windows XP (700MHz P3 Static) | 100M = 24,080 ms

Mobile Pentium III (Coppermine architecture), leveraging SDR/early DDR laptop memory speeds.

2009

Acer Aspire One D150

Windows XP (1.6GHz Atom Static) | 100M = 7,386 ms

Intel Atom N270 Netbook era. Low-power, in-order execution processor designed for portability over performance.

2015 - 2018

Intel Broadwell & Amber Lake Era

macOS Sonoma (MacBook Air 2018 i5-8210Y @ 1.6GHz Base / 3.6GHz Turbo) | 100M = 797 ms

MacBook Pro & Air architectures utilizing multi-core ultra-low voltage Intel core profiles with high turbos.

Modern Era

AMD Ryzen 7 9800X3D

Windows 11 (64-bit Native @ 4.7GHz Base / 5.2GHz Boost) | 100M = 167 ms

State-of-the-art TSMC 4nm processing node packing stacked L3 V-Cache layout operating at blistering IPC throughput.