Kotlin Coroutines Under the Hood

13 February 2026

Kotlin Kotlin Coroutines

I still remember the first time I wrote suspend fun and felt like something magical was happening. The function could pause, go do something else, and come back to exactly where it left off. No callbacks, no Rx chains, no Handler.postDelayed. It just worked.

But “it just works” is a dangerous place to stay. When I started debugging coroutine stack traces that made no sense, or when a delay() resumed my coroutine on a completely different thread than where it started, I realized I had no mental model for what was actually happening underneath. And without that model, I was writing coroutines by guesswork.

So I went into the bytecode. And what I found changed how I think about coroutines entirely: your suspend function is not a function. It’s a class. A state machine, generated by the Kotlin compiler, with a label field and a when-expression that jumps between states. Every suspend call is a potential exit point, and every resume is a re-entry into that same state machine at the next label. Once you see this, coroutines stop being magic. They become a well-designed compiler trick that you can reason about, debug, and optimize.

From Callbacks to CPS

Before coroutines, Android had a painful history with async code. AsyncTask, then RxJava, then callback hell. The core problem was always the same — you needed to break sequential logic into pieces that could run later, but you had to wire those pieces together manually.

Kotlin coroutines solve this with Continuation Passing Style (CPS). The idea is old — it comes from Scheme and functional programming — but the Kotlin compiler applies it automatically. Here’s what CPS means in practice. When you write:

suspend fun fetchUser(userId: String): User {
    val token = authenticate(userId)    // suspension point #1
    val user = loadProfile(token)       // suspension point #2
    return user
}

The compiler transforms this into something conceptually like:

fun fetchUser(userId: String, continuation: Continuation<User>): Any? {
    val token = authenticate(userId, continuation)
    if (token == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED

    val user = loadProfile(token as Token, continuation)
    if (user == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED

    return user
}

Two things changed. First, an extra parameter was added: a Continuation<User> object. This is the callback — it knows how to resume the function when the suspended operation completes. Second, the return type changed from User to Any?. That’s because the function now returns either the actual User result or a special marker COROUTINE_SUSPENDED to signal that it paused. Kotlin doesn’t have union types, so Any? is the only way to express “either T or COROUTINE_SUSPENDED.” The Continuation interface itself is simple:

public interface Continuation<in T> {
    public val context: CoroutineContext
    public fun resumeWith(result: Result<T>)
}

It holds a CoroutineContext (which contains the dispatcher, job, exception handler) and a single resumeWith function. When the suspended operation finishes, someone calls resumeWith with the result, and the coroutine continues.

The State Machine

Here’s where it gets interesting. The CPS transformation above was simplified. In reality, the compiler doesn’t generate separate function calls with continuation threading. It generates a state machine — a single class with a label field that tracks where the coroutine paused. For our fetchUser function, the compiler generates something like this:

fun fetchUser(userId: String, completion: Continuation<User>): Any? {
    // The continuation IS the state machine
    val sm = completion as? FetchUserStateMachine
        ?: FetchUserStateMachine(completion)

    when (sm.label) {
        0 -> {
            sm.label = 1
            val result = authenticate(userId, sm)
            if (result == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED
            sm.result = result
            // Fall through to label 1
        }
        1 -> {
            sm.result.throwOnFailure()
            val token = sm.result as Token
            sm.label = 2
            val result = loadProfile(token, sm)
            if (result == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED
            sm.result = result
            // Fall through to label 2
        }
        2 -> {
            sm.result.throwOnFailure()
            return sm.result as User
        }
        else -> throw IllegalStateException("Invalid label")
    }
}

The state machine class stores every local variable that needs to survive across suspension points. The label field is just an Int that gets incremented at each suspension point. When the coroutine resumes, resumeWith is called on the continuation, which re-enters the same fetchUser function but now jumps to the correct label.

This is the reframe: there’s no thread parking, no fiber, no continuation object floating in memory waiting for a signal. There’s a class with fields, and a when-expression. Each suspend point is a potential exit, and each resume is a re-entry at the next label. That’s it. For N suspension points, the compiler generates N+1 states (0 through N). State 0 is the initial entry, and each subsequent state handles the result of the previous suspension.

What This Means for Your Stack Traces

Now you understand why coroutine stack traces look weird. When a coroutine suspends, the actual call stack unwinds completely. The state machine saves local variables into its fields, returns COROUTINE_SUSPENDED up the chain, and the thread is free.

When it resumes, a new call stack is created starting from the dispatcher. The state machine re-enters at the correct label, but the original call stack is gone. This is why you see frames like invokeSuspend and BaseContinuationImpl.resumeWith instead of your actual function hierarchy. Kotlin tried to address this with the -Xdebug compiler flag and the kotlinx-coroutines-debug module, which stitches together the logical call stack by tracking continuation chains. But the point is — understanding the state machine explains why debugging coroutines requires different tools than debugging threads.

Dispatchers and Continuation Interception

When your coroutine resumes, how does it end up on the right thread? This is where ContinuationInterceptor comes in. It’s a CoroutineContext.Element that wraps every continuation:

interface ContinuationInterceptor : CoroutineContext.Element {
    fun <T> interceptContinuation(
        continuation: Continuation<T>
    ): Continuation<T>
}

Every time a coroutine is about to resume, the runtime checks the context for a ContinuationInterceptor. If one exists, it wraps the continuation in a DispatchedContinuation that redirects resumeWith calls through the dispatcher. Dispatchers.Main uses Android’s Handler to post the resume to the main thread’s Looper. Dispatchers.IO uses a shared thread pool limited to 64 threads by default. Dispatchers.Default uses a thread pool sized to the number of CPU cores (with a minimum of two).

Here’s what’s subtle: the interception happens per-resume, not per-launch. If your coroutine suspends in one dispatcher context and resumes in another (because you called withContext), the interceptor at the resume site determines the thread. This is why withContext(Dispatchers.IO) actually works — it replaces the interceptor in the context, so when the inner block’s continuation resumes, it dispatches to the IO pool. The withContext call doesn’t create a new coroutine. It creates a new context with a different dispatcher, suspends the current coroutine, and re-dispatches the continuation.

suspend fun fetchAndParse(): ParsedData {
    // Running on Dispatchers.Main (from viewModelScope)
    val raw = withContext(Dispatchers.IO) {
        // Continuation intercepted → dispatched to IO pool
        api.fetchRawData()
    }
    // Back on Dispatchers.Main — interceptor changed back
    return parse(raw)
}

Structured Concurrency from the Inside

Structured concurrency isn’t just a nice API design — it’s built into the Job hierarchy. When you call launch or async inside a CoroutineScope, the new coroutine’s Job becomes a child of the scope’s Job. This parent-child relationship enforces three rules:

A parent Job doesn’t complete until all children complete. The parent enters the “completing” state and waits.
Cancelling a parent cancels all children. The parent calls cancel() on every child Job.
An uncaught exception in a child cancels the parent (unless using SupervisorJob).

This means viewModelScope.launch { ... } automatically cancels when the ViewModel clears, because the scope’s Job is cancelled, which cascades to every child coroutine. The SupervisorJob breaks rule 3: child failures don’t propagate upward. This is why viewModelScope uses SupervisorJob + Dispatchers.Main.immediate — you don’t want one failing network call to cancel all other coroutines in the ViewModel.

Cancellation is Cooperative

One thing that tripped me up early: cancelling a coroutine doesn’t forcefully stop it. It sets a flag on the Job (isActive = false) and throws CancellationException at the next suspension point. If your coroutine is doing CPU-heavy work without any suspension points, it won’t respond to cancellation:

// This will NOT be cancelled
viewModelScope.launch {
    var sum = 0L
    for (i in 1..1_000_000_000) {
        sum += i  // No suspension point — cancellation can't interrupt
    }
}

You need to explicitly check for cancellation in tight loops:

viewModelScope.launch {
    var sum = 0L
    for (i in 1..1_000_000_000) {
        ensureActive()  // Throws CancellationException if cancelled
        sum += i
    }
}

ensureActive() is preferred over checking isActive because it throws immediately rather than requiring you to handle the exit yourself. yield() is another option — it checks for cancellation and also gives other coroutines a chance to run on the same dispatcher.

The Cost of Coroutines

Coroutines are lightweight, but they’re not free. Each coroutine creates a state machine class (allocated on the heap), and each suspension point stores local variables in that object. For a function with 3 suspension points and 5 local variables, that’s a class with at least 8 fields plus the label. In practice, this is negligible for most apps. The overhead of creating a coroutine is roughly comparable to creating a small object — orders of magnitude cheaper than creating a thread (which allocates a ~1MB stack by default on the JVM).

But there are real tradeoffs. The generated state machine classes increase your method count and APK size slightly. R8/ProGuard can optimize some of this away, but heavily coroutine-based code does produce more classes than equivalent callback code. In one of our projects, I measured roughly 2-3 extra classes per suspend function after compilation. The dispatcher overhead is also real. Dispatchers.Main uses Handler.post() which goes through the message queue. If you’re dispatching thousands of small results back to main, that queue overhead adds up. For tight UI updates, Dispatchers.Main.immediate avoids the re-dispatch if you’re already on the main thread.

What I Wish I Knew Earlier

After going through the internals, a few practical things clicked:

delay() is not Thread.sleep(). It suspends the state machine and schedules a resume via the dispatcher. The thread is free to do other work. This is why you can run thousands of concurrent delay() calls without thousands of threads. And if you’ve ever wondered how Flow operators work under the hood, the same state machine mechanics power those too.

withContext doesn’t create a new coroutine. It suspends the current one, switches the context, and resumes. This means it’s cheaper than launch + join for simple context switches. The continuation object IS the state machine. When you see Continuation<T> in coroutine internals, think “state machine instance.” It holds the label, the saved locals, and the resume logic all in one. Structured concurrency is not optional. Using GlobalScope or creating standalone scopes bypasses the parent-child Job tree. When the ViewModel clears and those coroutines keep running, you’ll wonder why your app is burning battery fetching data for a screen that no longer exists.

The moment coroutines stopped being magic for me was when I decompiled a suspend function and saw the label field and the when expression. Everything else — dispatchers, structured concurrency, cancellation — is just APIs built on top of that state machine. And once you see the machine, you can reason about every behavior coroutines exhibit.

Thanks for reading through all of this :), Happy Coding!