Getting Stuck: When Threads Betray You

Welcome back to Getting Stuck. This time, instead of the abstract world of Turing machines, I wandered into something that should’ve been simple: writing a little multithreaded program to speed up a task.

It was supposed to be clean, efficient, and elegant. Instead, I ended up chasing ghosts — bugs that showed up, disappeared, and then reappeared when I least expected.

Step 1: The Simple Idea

The task: count how many times a number appears in a giant list.

I thought, Why not parallelize it? With Python’s threading module, I could split the list into chunks and count in parallel.

import threading

numbers = [i % 100 for i in range(10**6)]
count = 0

def worker(chunk):
    global count
    local_count = chunk.count(42)
    count += local_count

threads = []
chunk_size = len(numbers) // 4
for i in range(4):
    t = threading.Thread(target=worker, args=(numbers[i*chunk_size:(i+1)*chunk_size],))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print("Count of 42:", count)

Seems harmless, right?

Step 2: The Bug That Comes and Goes

I ran the program. Sometimes I got the right answer. Other times, the result was… too small.

For example:

Count of 42: 10000

And then:

Count of 42: 9865

Same program. Same input. Different outputs.

This was the first time I felt that familiar Getting Stuck pit in my stomach.

Step 3: Wild Guesses

At first, I blamed Python. Maybe the interpreter was glitching? Maybe the threads weren’t really running in parallel?

So I rewrote it in C with pthread. The same bug appeared. Random wrong answers.

That ruled out Python weirdness.

Then I suspected slicing. Maybe I was cutting the array wrong? I added print statements to show the chunks. Everything was fine.

False leads everywhere.

Step 4: The Real Culprit — Race Conditions

It finally dawned on me. All four threads were doing:

count += local_count

But this isn’t atomic. Under the hood, it’s like:

Read count.
Add local_count.
Write result back.

Two threads doing this at the same time could both read the old value before either one writes back. One update clobbers the other.

That’s why sometimes results were right, sometimes wrong. The order of execution wasn’t guaranteed.

Step 5: The Fix

The fix was to use a lock to protect updates to the shared variable:

lock = threading.Lock()

def worker(chunk):
    global count
    local_count = chunk.count(42)
    with lock:
        count += local_count

Now every update to count was serialized. No clobbering. The result was consistent every time.

Step 6: The Deeper Problem

Happy ending? Not really.

I had solved the immediate race condition, but I noticed something else: performance hadn’t improved at all. In fact, it was slower than the single-threaded version.

Why?

The Global Interpreter Lock (GIL) in CPython meant my threads weren’t really parallelizing CPU-bound work.
My lock added even more overhead.

If I wanted true parallelism, I needed multiprocessing, not threading. My grand “speedup” had turned into a slowdown.

Reflection

This journey was brutal, because the bug wasn’t obvious.

At first, the program worked sometimes — the cruelest kind of bug.
I chased wrong explanations (Python quirks, slicing mistakes).
The real problem was a race condition, one of the trickiest beasts in CS.
Even after fixing it, I realized my whole approach (using threads in Python) wasn’t suited for the problem.

So I got stuck twice: once on the bug, and again on the bigger architectural mistake.

And maybe that’s the truest “getting stuck” there is — when fixing the bug only reveals that your whole plan needs to change.

Arya’s Algorithms

Search This Blog