Threading and Dispatcher Performance in Android

12 December 2025

Android Performance Kotlin Coroutines

A few months ago, we shipped a feature that loaded a user’s transaction history alongside their profile on a single screen. Everything looked fine in development — fast network, small datasets, no visible lag. Then the ANR reports started rolling in from production. Not a handful. Hundreds of them, all pointing to the same screen. The main thread was frozen for 5+ seconds on devices with slower CPUs.

My first instinct was that the network call was somehow running on Main. But it wasn’t — every suspend function was wrapped in withContext(Dispatchers.IO). The actual problem was subtler. We had a JSON parsing step running on Dispatchers.IO after the network response came back. Parsing 200+ transactions with nested objects was CPU-intensive work, sitting on the IO dispatcher alongside dozens of actual blocking calls. The IO pool was saturated, the CPU-bound parsing was waiting in line, and the UI was starving because results weren’t coming back fast enough. The fix was moving the parsing to Dispatchers.Default and keeping IO for actual IO. ANRs dropped to zero.

That experience taught me something I should have understood earlier: dispatchers are not interchangeable labels. They are thread pool configurations with specific sizing, scheduling characteristics, and contention behavior. Picking the wrong one doesn’t just make things slower — it can starve other operations and freeze your app.

How Dispatchers Actually Work

Most developers treat dispatchers as three named slots: Main for UI, IO for network/disk, Default for “everything else.” But what’s actually happening underneath is more interesting, and knowing it changes how you make decisions.

Both Dispatchers.IO and Dispatchers.Default are backed by the same underlying thread pool — an instance of CoroutineScheduler inside kotlinx.coroutines. They don’t each create their own set of threads. The CoroutineScheduler is a work-stealing scheduler with a core pool sized to the number of CPU cores (minimum 2). When you dispatch to Dispatchers.Default, it runs on these core threads. When you dispatch to Dispatchers.IO, the scheduler uses an elasticity mechanism — it can expand the thread count up to 64 (or kotlinx.coroutines.io.parallelism if you’ve set it) to handle blocking operations that would otherwise tie up the core threads.

Here’s the thing — because they share the same scheduler, a thread that was just running an IO task can immediately pick up a Default task without any cross-pool overhead. The separation between IO and Default isn’t about different thread pools. It’s about different concurrency limits on the same pool. Default is limited to CPU core count. IO can expand beyond that to absorb blocking calls. This is why putting CPU-bound work on Dispatchers.IO is wasteful: you’re consuming one of those 64 elastic slots for work that doesn’t actually block, and you’re potentially preventing real blocking IO from getting a thread.

Dispatchers.Main, on the other hand, is entirely separate. On Android, it’s backed by the main thread’s Looper and Handler. Every dispatch to Main posts a message to the queue and waits for the Looper to process it — which brings us to an important distinction.

Main vs Main.immediate

When you use Dispatchers.Main, every resume goes through Handler.post(). Even if the coroutine is already executing on the main thread, it still posts to the message queue and waits for the next Looper cycle. That’s one extra dispatch — and on a busy main thread, that can mean waiting behind input events, layout passes, and view invalidations.

Dispatchers.Main.immediate checks if the coroutine is already on the main thread. If it is, it resumes immediately in the current execution context without posting to the queue. If not, it falls back to the same Handler.post() behavior. This skips one full dispatch cycle, saving roughly 50-100Îźs per dispatch depending on message queue pressure.

class TransactionViewModel(
    private val repository: TransactionRepository,
) : ViewModel() {

    // viewModelScope uses Dispatchers.Main.immediate by default
    fun loadTransactions() {
        viewModelScope.launch {
            // Already on Main.immediate — no extra dispatch
            _uiState.value = UiState.Loading

            val transactions = withContext(Dispatchers.IO) {
                repository.fetchTransactions()
            }

            // Resumes on Main.immediate — immediate if already on main thread
            _uiState.value = UiState.Success(transactions)
        }
    }
}

This is why viewModelScope defaults to SupervisorJob() + Dispatchers.Main.immediate rather than plain Dispatchers.Main. Google made this choice deliberately — in animation code, one extra frame of delay between a state change and the UI update can cause visible stutter. If your coroutine updates a MutableStateFlow that drives a Compose recomposition, Main.immediate means the recomposition is triggered in the same frame rather than being pushed to the next one.

But Main.immediate isn’t universally better. If you have deeply recursive suspend calls that all resolve immediately (no actual suspension), Main.immediate keeps stacking frames without ever yielding. With regular Main, each step goes through the message queue, which effectively unwinds the stack. In extreme cases — think recursive tree traversal where each node is a suspend call — Main.immediate can overflow the stack. If you suspect this is happening, yield() forces a dispatch point and breaks the recursion.

Dispatchers.Unconfined — The One Most People Get Wrong

Dispatchers.Unconfined is the dispatcher that doesn’t dispatch. When a coroutine starts on Unconfined, it executes immediately in the caller’s thread — no queue, no scheduling. But here’s the part that trips people up: after the first suspension point, the coroutine resumes on whatever thread the suspending function happened to complete on. You have zero control over which thread that is.

This means if you launch on Dispatchers.Unconfined and call a suspend function that internally completes on an IO thread, your code after the suspension is now running on that IO thread. If the suspend function completes on a callback thread from a native library, you’re running there. The thread affinity is completely unpredictable after any suspension.

fun demonstrateUnconfined() {
    // Starts on the calling thread (e.g., main)
    CoroutineScope(Dispatchers.Unconfined).launch {
        println("Before suspend: ${Thread.currentThread().name}") // main

        delay(100) // suspends here

        // Resumes on whatever thread the delay timer completed on
        println("After suspend: ${Thread.currentThread().name}") // kotlinx.coroutines.DefaultExecutor
    }
}

So when is Unconfined actually useful? Mostly in testing and event-handling pipelines where you want zero dispatch overhead and you don’t care about thread identity. UnconfinedTestDispatcher in kotlinx-coroutines-test is built on this concept — it lets coroutines run eagerly so your tests don’t need to manually advance time for every launch. In production code, I’d reach for CoroutineStart.UNDISPATCHED on a specific launch instead of Unconfined on the whole scope, because UNDISPATCHED gives you the same “start immediately” behavior for the initial execution while still dispatching normally after suspension. It’s the scoped version of the same optimization without the thread-safety landmine.

Dispatchers.IO and Thread Pool Saturation

The default parallelism limit for Dispatchers.IO is 64 threads. That number is based on the assumption that IO-dispatched work is blocking — waiting on network sockets, disk reads, database queries. While a thread is blocked, it’s not using CPU, so you can have many more threads than cores. The number 64 is a practical default: high enough to keep concurrent network requests in flight, low enough to avoid excessive thread creation overhead.

The real problem developers run into isn’t the 64-thread limit — it’s putting the wrong kind of work on IO. I’ve seen codebases where JSON deserialization, image decoding, and even sorting large lists all happen on Dispatchers.IO because they were “part of the data loading pipeline.” Each of those is CPU-bound. When they’re running alongside actual blocking calls, you’re using elastic threads for work that should run on the fixed core pool, and CPU-bound work runs slower because you get more context switching and cache thrashing than you would on a pool sized to your core count.

class TransactionRepository(
    private val api: TransactionApi,
    private val parser: TransactionParser,
) {
    suspend fun fetchTransactions(): List<Transaction> {
        // Network call — genuinely blocking IO, belongs on IO
        val rawJson = withContext(Dispatchers.IO) {
            api.getRawTransactions()
        }

        // Parsing — CPU-bound work, belongs on Default
        val transactions = withContext(Dispatchers.Default) {
            parser.parseTransactions(rawJson)
        }

        return transactions
    }
}

In our production app, moving JSON parsing from IO to Default reduced P95 parse times by about 40% on mid-range devices. But how do you know when your IO pool is actually saturated? The clearest signal is a thread dump. If you capture one during a slow operation (via Android Studio’s debugger or Thread.getAllStackTraces()) and see most of your DefaultDispatcher-worker-* threads in BLOCKED or WAITING state on IO operations, you’ve hit saturation. In Perfetto traces, look for gaps between coroutine task slices on the thread track — long gaps mean threads are busy elsewhere and your work is queued. Another symptom: operations that should take milliseconds suddenly take seconds, but only under load. That’s thread starvation — every IO slot is occupied and new work is waiting in the CoroutineScheduler’s global queue.

Dispatchers.Default and the Core Count Connection

Dispatchers.Default creates a thread pool equal to the number of CPU cores, with a minimum of 2. On a modern phone with 8 cores, that’s 8 threads. This sizing is intentional — for CPU-bound work, adding more threads than cores doesn’t make things faster. It makes things slower because of context switching overhead. Each context switch costs roughly 5-15μs on most ARM processors, and each swap can flush the CPU cache, meaning the new thread reloads data from main memory.

The CoroutineScheduler maintains a global queue and per-thread local queues. When a thread finishes its task, it first checks its local queue (fast, no contention), then tries to steal from another thread’s queue (moderate cost), and finally falls back to the global queue (requires synchronization). CPU-bound work benefits from this work-stealing design because related tasks tend to stay on the same thread, preserving cache locality. But if you mix CPU and IO work by dispatching everything to the same dispatcher, blocking IO tasks interrupt the work-stealing pattern and you lose that locality benefit.

withContext Cost Internals

withContext is the standard way to switch dispatchers mid-coroutine, but not every withContext call actually switches threads. When you call withContext with the same dispatcher the coroutine is already running on, the coroutines library takes a fast path — it skips the dispatch entirely and just runs the block inline. No thread switch, no queue, no scheduling overhead. This is why withContext(Dispatchers.Default) { withContext(Dispatchers.Default) { ... } } doesn’t cost you two context switches. The inner call is essentially a no-op from a threading perspective.

When the dispatchers are different, withContext suspends the current coroutine, dispatches the block to the target dispatcher’s queue, and then dispatches back to the original dispatcher when the block completes. That’s two dispatches for one withContext call — one there, one back. At 50-100μs per dispatch on a Pixel 7, a single withContext that actually changes threads costs you 100-200μs round-trip. Consider a screen that loads 10 items from a paginated API, where each item goes through IO fetch → Default parse → Main render. That’s 30 dispatcher switches per page load. At 100μs each, you’re spending 3ms just on dispatching overhead. On a 16ms frame budget, that’s nearly 20% spent on thread coordination.

This is why I don’t recommend wrapping every single function in withContext. If you have a chain of operations that all belong on the same dispatcher, keep them in one block. The overhead of unnecessary context switches is small individually but adds up in hot paths.

limitedParallelism — The Right Way to Control Concurrency

Before limitedParallelism, developers created custom dispatchers with Executors.newFixedThreadPool(n).asCoroutineDispatcher(). This created entirely separate thread pools — those threads couldn’t be shared with anything else. limitedParallelism solves this by creating a view over the parent dispatcher, not a new thread pool. It limits how many coroutines from this view can run concurrently, but the actual threads come from the parent pool.

class AppDispatchers {
    // Limits database operations to 4 concurrent coroutines
    // but uses threads from the IO pool
    val databaseDispatcher = Dispatchers.IO.limitedParallelism(4)

    // Limits file write operations to 2 concurrent coroutines
    val fileWriteDispatcher = Dispatchers.IO.limitedParallelism(2)

    // For heavy computation that shouldn't starve other Default work
    val imageProcessingDispatcher = Dispatchers.Default.limitedParallelism(2)
}

The databaseDispatcher limits database concurrency to 4, which protects SQLite from too many concurrent writers (SQLite serializes writes anyway, so more threads just means more lock contention). The imageProcessingDispatcher limits CPU-intensive image work to 2 threads so it doesn’t monopolize the Default pool and starve other computational work.

But limitedParallelism isn’t free. It adds a coordination layer — a semaphore-like mechanism that tracks how many coroutines are currently active in the view. Each dispatch checks this counter, and if the limit is reached, the coroutine is queued until a slot opens. In most Android apps this overhead is negligible, but IMO it’s good to know you’re trading a small amount of dispatch latency for better resource control. For limitedParallelism(1) as a single-writer pattern, this is essentially a coroutine-based mutex — works well, though a real Mutex might be more readable for that specific use case.

Practical Guidelines

After debugging enough dispatcher-related issues, here’s how I think about choosing dispatchers in practice:

One last thing — always inject your dispatchers. Hardcoding Dispatchers.IO throughout your codebase makes testing painful because you can’t swap in TestDispatcher. Wrapping dispatchers in an injectable class means your tests run on UnconfinedTestDispatcher or StandardTestDispatcher, giving you deterministic control over coroutine execution without flaky timing issues.

class AppCoroutineDispatchers(
    val main: CoroutineDispatcher = Dispatchers.Main.immediate,
    val io: CoroutineDispatcher = Dispatchers.IO,
    val default: CoroutineDispatcher = Dispatchers.Default,
    val database: CoroutineDispatcher = Dispatchers.IO.limitedParallelism(4),
)

Dispatchers are one of those things that seem simple until they aren’t. They work fine with defaults for most code. But the moment your app hits real-world scale — hundreds of concurrent operations, mixed CPU and IO workloads, tight frame budgets — understanding what’s happening underneath the API becomes the difference between an app that feels smooth and one that freezes on your users’ devices.

Thanks for reading through all of this :), Happy Coding!