Design a Chat Application

System Design Round

Design a Chat Application

Chat apps combine real-time communication, offline support, and local persistence — all core mobile engineering challenges rolled into one design problem.

How would you layer the client architecture for a chat app?

The architecture follows the standard layered approach with a few chat-specific components:

The Room database is the single source of truth. The UI observes the database, and incoming messages from the WebSocket are written to the database first and then displayed. Outgoing messages are also written to the database first with a PENDING status and then sent over the network.

Why WebSocket over long polling or SSE for real-time messaging?

WebSocket is a full-duplex, persistent TCP connection. After an HTTP handshake upgrade, both client and server can send messages at any time with minimal overhead (2-byte frame header). This is the standard choice for chat apps because messaging is bidirectional.

Long polling works by the client sending an HTTP request, and the server holding the connection open until it has new data or a timeout occurs. When the server responds, the client immediately sends another request. It works through all proxies and firewalls but has a small latency gap between each response-request cycle.

SSE (Server-Sent Events) is a one-way channel from server to client over HTTP. The server pushes events, but the client can’t send data back over the same connection. Not suitable for chat because you need to send messages too.

For a chat app, WebSocket is the right choice. OkHttp has built-in WebSocket support, so the client side is straightforward. Long polling is a reasonable fallback when WebSocket connections are blocked by network proxies.

What data model would you use for messages?

A message needs enough metadata to be ordered, displayed, and synced correctly.

@Entity(
    tableName = "messages",
    indices = [
        Index(value = ["conversationId", "timestamp"]),
        Index(value = ["clientMessageId"], unique = true)
    ]
)
data class MessageEntity(
    @PrimaryKey val id: String,
    val clientMessageId: String,
    val conversationId: String,
    val senderId: String,
    val content: String,
    val type: MessageType,
    val timestamp: Long,
    val localTimestamp: Long,
    val status: MessageStatus,
    val mediaUrl: String? = null,
    val mediaLocalPath: String? = null
)

enum class MessageStatus { PENDING, SENT, DELIVERED, READ, FAILED }
enum class MessageType { TEXT, IMAGE, VIDEO, FILE }

The clientMessageId is a UUID generated by the client when the message is created. This handles deduplication — if the network drops after the server receives the message but before the client gets the acknowledgment, the client retries with the same clientMessageId and the server ignores the duplicate. The id field is the server-assigned ID that arrives with the acknowledgment.

How does the offline message queue work?

Write the message to the local database with a PENDING status immediately. This gives the user instant feedback — they see their message in the conversation right away. Queue the message for delivery and attempt to send it when the network is available.

class SendMessageUseCase(
    private val messageDao: MessageDao,
    private val chatSocket: ChatConnectionManager,
    private val networkMonitor: NetworkMonitor
) {
    suspend fun send(conversationId: String, content: String) {
        val message = MessageEntity(
            id = "",
            clientMessageId = UUID.randomUUID().toString(),
            conversationId = conversationId,
            senderId = currentUserId,
            content = content,
            type = MessageType.TEXT,
            timestamp = 0,
            localTimestamp = System.currentTimeMillis(),
            status = MessageStatus.PENDING
        )

        messageDao.insert(message)

        if (networkMonitor.isOnline.value && chatSocket.isConnected) {
            chatSocket.sendMessage(message)
        }
    }
}

When the network returns, the sync engine queries all PENDING messages and sends them in order. Once the server acknowledges receipt, update the status to SENT. If the user is offline for a long time, they might have dozens of queued messages — send them sequentially to preserve ordering.

How do you handle message ordering? What problems can arise?

Message ordering is trickier than it seems. You can’t rely on client timestamps because clocks are unreliable — two users’ phones might differ by minutes. You can’t rely solely on server timestamps because network latency means messages arrive at the server in a different order than they were sent.

The practical solution for most chat apps:

@Dao
interface MessageDao {
    @Query("""
        SELECT * FROM messages 
        WHERE conversationId = :conversationId 
        ORDER BY 
            CASE WHEN status = 'PENDING' THEN localTimestamp 
            ELSE timestamp END ASC
    """)
    fun getMessagesForConversation(
        conversationId: String
    ): Flow<List<MessageEntity>>
}

For 1:1 chats, server timestamps with sequence numbers work well. For distributed systems with multiple servers, you might need vector clocks or Lamport timestamps, but that’s beyond what most mobile interviews expect.

How do you handle retry logic and delivery guarantees?

Chat apps need at-least-once delivery — every message must eventually reach the server. The retry mechanism handles transient failures.

class MessageRetryManager(
    private val messageDao: MessageDao,
    private val chatSocket: ChatConnectionManager
) {
    suspend fun retryPendingMessages() {
        val pending = messageDao.getPendingMessages()
        for (message in pending) {
            var retryCount = 0
            var success = false
            while (!success && retryCount < 10) {
                try {
                    chatSocket.sendMessage(message)
                    messageDao.updateStatus(
                        message.clientMessageId, MessageStatus.SENT
                    )
                    success = true
                } catch (e: IOException) {
                    retryCount++
                    delay(minOf(1000L * (1 shl retryCount), 60_000L))
                }
            }
            if (!success) {
                messageDao.updateStatus(
                    message.clientMessageId, MessageStatus.FAILED
                )
            }
        }
    }
}

Deduplication on the server side is critical. The server uses clientMessageId to detect duplicates — if it receives the same clientMessageId twice, it ignores the second one and returns the original response.

How do you manage the WebSocket connection lifecycle?

The WebSocket connection should be active when the app is in the foreground. Managing it involves connecting when the app comes to the foreground, disconnecting when it goes to the background, reconnecting on failure with exponential backoff, and sending periodic heartbeats to detect stale connections.

class ChatConnectionManager(
    private val okHttpClient: OkHttpClient
) {
    private var webSocket: WebSocket? = null
    private var retryCount = 0

    fun connect() {
        val request = Request.Builder()
            .url("wss://chat.example.com/ws")
            .build()

        webSocket = okHttpClient.newWebSocket(
            request,
            object : WebSocketListener() {
                override fun onMessage(ws: WebSocket, text: String) {
                    retryCount = 0
                    handleIncomingMessage(text)
                }
                override fun onFailure(
                    ws: WebSocket, t: Throwable, response: Response?
                ) {
                    scheduleReconnect()
                }
            }
        )
    }

    private fun scheduleReconnect() {
        val delay = minOf(1000L * (1 shl retryCount), 30_000L)
        retryCount++
        // Schedule reconnect after delay
    }
}

Don’t keep the WebSocket open when the app is in the background — it holds a wake lock and drains battery. Use FCM push notifications to wake the app for new messages when it’s backgrounded.

How do you structure the local Room database schema?

The database needs three main entities: conversations, messages, and users. Design the schema around your query patterns.

@Entity(tableName = "conversations")
data class ConversationEntity(
    @PrimaryKey val id: String,
    val title: String?,
    val lastMessageContent: String?,
    val lastMessageTimestamp: Long,
    val unreadCount: Int,
    val isGroup: Boolean,
    val participantIds: String
)

@Dao
interface ConversationDao {
    @Query("""
        SELECT * FROM conversations 
        ORDER BY lastMessageTimestamp DESC
    """)
    fun observeConversations(): Flow<List<ConversationEntity>>

    @Query("""
        UPDATE conversations SET unreadCount = 0 
        WHERE id = :conversationId
    """)
    suspend fun clearUnreadCount(conversationId: String)
}

Denormalize the lastMessageContent and lastMessageTimestamp into the conversation entity. This avoids a JOIN query every time the conversation list loads. Update these fields whenever a new message arrives in that conversation. Index the messages table on (conversationId, timestamp) since the most common query is fetching messages for a conversation in chronological order.

What are the core features you would include in a chat application?

The essential features are:

Start with 1:1 text messaging and expand from there. Interviewers prefer depth over breadth.

What are the key non-functional requirements for a chat app?

What’s in scope and what’s out of scope for a mobile system design interview?

In scope — client architecture, data model, real-time connection strategy, offline queue, local database, sync logic, push notifications, message ordering, and retry mechanism.

Out of scope — server-side message routing and fan-out, infrastructure scaling (Kafka, sharding), signaling for voice/video calls, and payment or commerce features. Mention these briefly if the interviewer asks, but don’t spend time designing them.

What does the API design look like for conversations and messages?

Two main API surfaces — REST for CRUD operations and history, WebSocket for real-time events.

REST endpoints:

WebSocket events (bidirectional):

Every WebSocket message includes a clientMessageId so the client can correlate acknowledgments with pending messages. The REST API handles bulk operations and history, while the WebSocket handles real-time flow.

How do push notifications work for a chat app?

When the app is in the background, the server sends a push notification through FCM. The notification payload should be a data message (not a notification message) so your app has full control over how it’s displayed.

When the user taps the notification, deep link to the specific conversation. Use notification channels and message grouping so multiple messages from the same conversation stack neatly instead of flooding the notification shade. If the WebSocket is connected and the app is in the foreground, the server should skip sending a push notification for that message since the client already received it over the socket.

How do you sync message history after the app has been offline?

When the app opens, it needs to catch up on messages received while it was offline. The client stores the lastSyncTimestamp for each conversation and requests everything after it.

class MessageSyncManager(
    private val api: ChatApi,
    private val messageDao: MessageDao
) {
    suspend fun syncConversation(conversationId: String) {
        val lastTimestamp = messageDao
            .getLastMessageTimestamp(conversationId) ?: 0

        var cursor: String? = null
        do {
            val response = api.getMessages(
                conversationId = conversationId,
                after = lastTimestamp,
                cursor = cursor
            )
            messageDao.insertAll(response.messages)
            cursor = response.nextCursor
        } while (response.hasMore)
    }
}

This sync happens in the background after connecting the WebSocket. The WebSocket handles real-time messages going forward, and the REST sync fills in the gap for messages missed while offline. Use cursor-based pagination to handle large gaps efficiently.

How do you implement read receipts?

Track three states per message: sent, delivered, and read.

class ReadReceiptManager(
    private val chatSocket: ChatConnectionManager,
    private val messageDao: MessageDao
) {
    fun markAsRead(conversationId: String, lastReadMessageId: String) {
        chatSocket.sendReadReceipt(conversationId, lastReadMessageId)
        messageDao.markMessagesAsRead(conversationId, lastReadMessageId)
    }
}

Batch read receipts — don’t send one for every message. When the user scrolls through 20 unread messages, send a single receipt with the ID of the last message they saw. The server marks all messages up to that ID as read.

How do you implement typing indicators?

The client detects text input changes and sends a “typing” event over the WebSocket. The receiving client shows “typing…” and hides it after a timeout.

class TypingIndicatorManager(
    private val chatSocket: ChatConnectionManager,
    private val scope: CoroutineScope
) {
    private var typingJob: Job? = null

    fun onTextChanged(conversationId: String, text: String) {
        if (text.isEmpty()) {
            typingJob?.cancel()
            chatSocket.sendTypingEvent(conversationId, false)
            return
        }
        if (typingJob?.isActive != true) {
            chatSocket.sendTypingEvent(conversationId, true)
        }
        typingJob?.cancel()
        typingJob = scope.launch {
            delay(3000)
            chatSocket.sendTypingEvent(conversationId, false)
        }
    }
}

Typing indicators are low-priority — don’t persist them to the database or queue them for offline delivery. They’re fire-and-forget over the WebSocket.

How do you handle media messages like images and videos?

Media messages have a different flow than text messages because the file needs to be uploaded separately from the message metadata.

  1. The user selects a photo. Compress it and generate a thumbnail locally
  2. Insert a message into the local database with PENDING status and the local file path. Show the thumbnail in the chat immediately
  3. Upload the file to a storage service (S3, Cloud Storage) in the background. Show upload progress in the UI
  4. When the upload completes, send the message metadata (including the media URL) over the WebSocket
  5. The recipient receives the message, downloads the thumbnail first (fast), then the full image on demand or automatically on Wi-Fi
class MediaMessageSender(
    private val fileUploader: FileUploader,
    private val messageDao: MessageDao,
    private val chatSocket: ChatConnectionManager
) {
    suspend fun sendImage(conversationId: String, imageUri: Uri) {
        val compressed = compressImage(imageUri, 1920, 80)
        val thumbnail = createThumbnail(compressed, 200)

        val message = MessageEntity(
            clientMessageId = UUID.randomUUID().toString(),
            conversationId = conversationId,
            type = MessageType.IMAGE,
            status = MessageStatus.PENDING,
            mediaLocalPath = compressed.absolutePath
        )
        messageDao.insert(message)

        val mediaUrl = fileUploader.upload(compressed)
        val thumbUrl = fileUploader.upload(thumbnail)
        messageDao.updateMediaUrl(message.clientMessageId, mediaUrl, thumbUrl)
        chatSocket.sendMediaMessage(message.clientMessageId, mediaUrl, thumbUrl)
    }
}

For large files like videos, use chunked upload with resume support so the upload survives network interruptions. Use WorkManager for background uploads to survive process death.

How do you make the chat list screen performant?

The chat list shows all conversations sorted by the most recent message. The ViewModel observes a Room Flow that returns conversations ordered by lastMessageTimestamp DESC. When a new message arrives via WebSocket, the repository updates the conversation’s lastMessageContent, lastMessageTimestamp, and unreadCount. Room’s Flow automatically triggers a UI update.

For performance with hundreds of conversations:

The unread count badge should be reactive — decrement it when the user opens a conversation and reads messages. This is a local database update, not a network call.

Full-text search across all messages requires a different approach than standard SQL queries. Room supports FTS (Full-Text Search) through virtual tables.

@Fts4(contentEntity = MessageEntity::class)
@Entity(tableName = "messages_fts")
data class MessageFts(val content: String)

@Dao
interface SearchDao {
    @Query("""
        SELECT messages.* FROM messages
        JOIN messages_fts ON messages.rowid = messages_fts.rowid
        WHERE messages_fts MATCH :query
        ORDER BY messages.timestamp DESC
        LIMIT 50
    """)
    suspend fun searchMessages(query: String): List<MessageEntity>
}

FTS tables create an inverted index over the content column, making text search fast even with millions of messages. The tradeoff is increased database size — the FTS index can be 50-100% of the original data size. For most chat apps, this is acceptable because message text is relatively small. Show search results grouped by conversation so the user can jump to the relevant context.

How does end-to-end encryption work at a high level?

End-to-end encryption means the server can’t read message content. Only the sender and recipient have the decryption keys.

In practice, apps like Signal use the Signal Protocol which adds forward secrecy through ratcheting key exchanges — each message uses a different encryption key derived from a chain. If one key is compromised, previous and future messages remain secure. For a mobile interview, explaining the public/private key concept and mentioning the Signal Protocol is sufficient depth. Focus on how it affects the client architecture: the encryption/decryption layer sits between the message sending logic and the network layer, and key management uses Android KeyStore.

How does group messaging differ from 1:1 chats?

Group messaging adds complexity in several areas:

The data model stays mostly the same — the conversationId just maps to a group instead of a pair of users. The ConversationEntity has an isGroup flag and stores participant IDs.

Common Follow-ups