Music streaming combines audio playback, background services, media session integration, offline support, and queue management.
The essential features are browse and search (discover music by artist, album, genre, or keyword), audio playback (stream tracks with play/pause/seek/skip), playlists (create, edit, reorder, share), and offline mode (download tracks for playback without network). Beyond that, the app needs background playback so music continues when the user leaves the app, a playback queue with shuffle and repeat, and media controls on the lock screen, notification, and Bluetooth devices.
Three things matter most for a music streaming app:
Latency matters less than in video — users tolerate 2-3 seconds of initial buffering when they tap a song.
Focus on the playback path end-to-end: user taps a song, the app builds a queue, starts a foreground service, streams audio through ExoPlayer, shows controls in the notification and lock screen, and handles interruptions like phone calls. Then go deep on one or two areas — offline downloads, gapless playback, or queue management. Skip social features, lyrics, and recommendation algorithms unless the interviewer asks.
The architecture has three layers:
The playback engine runs in a MediaSessionService, separate from the UI lifecycle. The UI connects to it through a MediaController. This separation means playback survives activity destruction, configuration changes, and even the app being removed from recents.
MediaPlayer is the old framework API — limited format support, poor error handling, no adaptive streaming. ExoPlayer (now part of AndroidX Media3) supports DASH, HLS, and progressive streams, handles DRM, supports gapless playback natively, and is actively maintained by Google.
class AudioPlayer(context: Context) {
private val player = ExoPlayer.Builder(context).build()
fun play(url: String) {
val mediaItem = MediaItem.fromUri(url)
player.setMediaItem(mediaItem)
player.prepare()
player.play()
}
fun pause() = player.pause()
fun seekTo(positionMs: Long) = player.seekTo(positionMs)
fun release() = player.release()
}
ExoPlayer handles buffering, format detection, and codec selection internally. It also supports playlists natively — set multiple MediaItems and it handles transitions between tracks. Media3 wraps ExoPlayer with better API design and Jetpack integration.
Three main API groups:
The catalog and playlist APIs use standard REST. The stream API returns a URL that ExoPlayer fetches directly — the app never downloads the audio bytes through its own networking layer.
The key entities are Track, Playlist, and PlaybackQueue.
data class Track(
val id: String,
val title: String,
val artist: String,
val albumId: String,
val durationMs: Long,
val artworkUrl: String,
val streamUrl: String
)
data class Playlist(
val id: String,
val name: String,
val ownerId: String,
val trackIds: List<String>,
val createdAt: Long,
val updatedAt: Long
)
data class PlaybackQueue(
val tracks: List<Track>,
val currentIndex: Int,
val shuffleEnabled: Boolean,
val repeatMode: RepeatMode // OFF, ONE, ALL
)
Track metadata is cached in Room for offline access and fast loading. The streamUrl is short-lived — the app fetches a fresh URL from the stream API when the user actually plays the track. Playlists are stored locally and synced with the server.
Use a foreground service with a persistent notification. Without a service, Android kills the process shortly after the user leaves, and playback stops. Media3’s MediaSessionService handles the foreground service lifecycle automatically — it starts as foreground when playback begins and stops when playback ends.
class PlaybackService : MediaSessionService() {
private lateinit var player: ExoPlayer
private lateinit var mediaSession: MediaSession
override fun onCreate() {
super.onCreate()
player = ExoPlayer.Builder(this).build()
mediaSession = MediaSession.Builder(this, player).build()
}
override fun onGetSession(
controllerInfo: MediaSession.ControllerInfo
): MediaSession = mediaSession
override fun onDestroy() {
player.release()
mediaSession.release()
super.onDestroy()
}
}
On Android 14+, declare android:foregroundServiceType="mediaPlayback" in the manifest. The MediaSessionService also creates the notification with playback controls automatically. The UI binds to this service through a MediaController and disconnects freely without affecting playback.
Audio focus is how Android coordinates audio between apps. When your app starts playing, it requests focus. If another app (navigation, phone call) needs audio, your app must respond — pause for a phone call, lower volume for a navigation prompt.
class AudioFocusHandler(context: Context) {
private val audioManager = context.getSystemService(AudioManager::class.java)
private val focusRequest = AudioFocusRequest
.Builder(AudioManager.AUDIOFOCUS_GAIN)
.setAudioAttributes(
AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_MEDIA)
.setContentType(AudioAttributes.CONTENT_TYPE_MUSIC)
.build()
)
.setOnAudioFocusChangeListener { change ->
when (change) {
AudioManager.AUDIOFOCUS_LOSS -> player.pause()
AudioManager.AUDIOFOCUS_LOSS_TRANSIENT -> player.pause()
AudioManager.AUDIOFOCUS_LOSS_TRANSIENT_CAN_DUCK ->
player.setVolume(0.2f)
AudioManager.AUDIOFOCUS_GAIN -> {
player.setVolume(1.0f)
player.play()
}
}
}
.build()
}
AUDIOFOCUS_LOSS means another app took focus permanently — pause. AUDIOFOCUS_LOSS_TRANSIENT means temporary loss like a phone call — pause and resume when focus returns. AUDIOFOCUS_LOSS_TRANSIENT_CAN_DUCK means lower volume instead of pausing, useful for navigation prompts over music. ExoPlayer handles this automatically if you call setHandleAudioFocus(true) on the player.
Gapless playback means no silence gap between consecutive tracks. This matters for live albums, classical music, and DJ mixes where tracks flow into each other. ExoPlayer supports it natively when you use a playlist of MediaItems — it pre-buffers the next track and trims encoder delay/padding using the LAME header in MP3 files.
// Gapless — just load tracks as a playlist
val items = playlist.map { MediaItem.fromUri(it.streamUrl) }
player.setMediaItems(items)
player.prepare()
player.play()
Crossfade is different — the current track fades out while the next fades in, overlapping by a configurable duration (Spotify offers 1-12 seconds). This requires more work because ExoPlayer doesn’t support crossfade out of the box. One approach is to use two player instances — one fading out, one fading in — and mix their output. You’d start the second player N seconds before the current track ends, ramp down the first player’s volume while ramping up the second, then release the first player when the fade completes. True gapless is simpler and is what ExoPlayer does by default.
Offline downloads let users save tracks for playback without network. Use WorkManager to schedule downloads — it handles network constraints, retry logic, and survives app restarts. Store downloaded audio in the app’s internal storage or encrypted external storage.
class DownloadTrackWorker(
context: Context,
params: WorkerParameters
) : CoroutineWorker(context, params) {
override suspend fun doWork(): Result {
val trackId = inputData.getString("trackId") ?: return Result.failure()
val streamUrl = api.getStreamUrl(trackId)
val file = File(applicationContext.filesDir, "offline/$trackId.enc")
httpClient.downloadTo(streamUrl, file)
trackDao.markDownloaded(trackId, file.absolutePath)
return Result.success()
}
}
// Schedule a playlist download
fun downloadPlaylist(playlist: Playlist) {
playlist.trackIds.forEach { trackId ->
val request = OneTimeWorkRequestBuilder<DownloadTrackWorker>()
.setInputData(workDataOf("trackId" to trackId))
.setConstraints(Constraints(requiredNetworkType = NetworkType.CONNECTED))
.build()
WorkManager.getInstance(context).enqueue(request)
}
}
When playing a track, check if it’s downloaded first. If yes, play from local storage. If not and there’s no network, skip to the next downloaded track. For DRM content, the downloaded files should be encrypted — only the app can decrypt and play them. Track download state in Room so the UI can show progress and filter for offline-available content.
The queue holds the list of tracks to play. It needs to support shuffle, repeat, add-next, add-to-end, remove, and reorder. The key challenge is shuffle — when the user enables it, the current track stays playing and the rest get shuffled. When they disable it, the queue returns to the original order at the current track’s position.
class PlaybackQueue {
private val originalOrder = mutableListOf<Track>()
private val shuffledOrder = mutableListOf<Track>()
private var currentIndex = 0
private var shuffleEnabled = false
private var repeatMode = RepeatMode.OFF
private val activeQueue: List<Track>
get() = if (shuffleEnabled) shuffledOrder else originalOrder
fun next(): Track? {
if (repeatMode == RepeatMode.ONE) return activeQueue[currentIndex]
currentIndex++
if (currentIndex >= activeQueue.size) {
if (repeatMode == RepeatMode.ALL) currentIndex = 0
else return null
}
return activeQueue.getOrNull(currentIndex)
}
fun toggleShuffle() {
val current = activeQueue[currentIndex]
shuffleEnabled = !shuffleEnabled
if (shuffleEnabled) {
shuffledOrder.clear()
shuffledOrder.addAll(originalOrder.shuffled())
shuffledOrder.remove(current)
shuffledOrder.add(0, current)
}
currentIndex = if (shuffleEnabled) 0
else originalOrder.indexOf(current)
}
}
“Play next” inserts a track right after currentIndex. “Add to queue” appends to the end. Both operations need to update shuffledOrder and originalOrder consistently. Persist the queue to SharedPreferences or Room so it survives process death — save the track IDs, current index, shuffle state, and repeat mode.
MediaSession is the bridge between your player and the Android system. It publishes what’s playing (title, artist, album art, duration) and the playback state (playing, paused, position). The system uses this to show controls on the lock screen, notification, Bluetooth devices, Wear OS, Android Auto, and Google Assistant.
val mediaSession = MediaSession.Builder(context, player)
.setCallback(object : MediaSession.Callback {
override fun onAddMediaItems(
session: MediaSession,
controller: MediaSession.ControllerInfo,
mediaItems: MutableList<MediaItem>
): ListenableFuture<List<MediaItem>> {
val resolved = mediaItems.map { resolveToStreamUrl(it) }
return Futures.immediateFuture(resolved)
}
})
.build()
Media3’s MediaSession syncs with ExoPlayer state automatically — you don’t manually update the session on every play/pause/skip. The MediaSessionService creates the notification from the session. Any client can connect through a MediaController: the notification, lock screen, a car display via Bluetooth AVRCP, or Google Assistant. The onAddMediaItems callback is where you resolve a search query or media ID into a playable stream URL.
ExoPlayer manages buffering through its LoadControl. It downloads audio data ahead of the playback position and keeps a configurable amount in memory. The four key parameters are: minimum buffer before playback starts, maximum buffer to hold, buffer needed to resume after a rebuffer, and buffer for seek operations.
val player = ExoPlayer.Builder(context)
.setLoadControl(
DefaultLoadControl.Builder()
.setBufferDurationsMs(
15_000, // min buffer before playback starts
50_000, // max buffer to hold in memory
2_500, // buffer to resume after rebuffer
5_000 // buffer around keyframes for seek
)
.build()
)
.build()
A 50-second max buffer is reasonable for music — audio files are small compared to video, so this uses minimal memory. ExoPlayer also pre-buffers the next track in a playlist (“next-window loading”) when the current buffer is full enough, which is how gapless transitions work. For adaptive bitrate, the server provides the track at multiple quality levels (96, 160, 320 kbps) via HLS or DASH. ExoPlayer’s AdaptiveTrackSelection picks the best quality the network can sustain. In practice, most music apps let the user choose a quality setting and request that bitrate directly.
Cache recently streamed audio to avoid re-downloading when the user replays a track. ExoPlayer’s CacheDataSource wraps the network source with a disk cache.
class AudioCacheManager(context: Context) {
private val cache = SimpleCache(
File(context.cacheDir, "audio_cache"),
LeastRecentlyUsedCacheEvictor(500 * 1024 * 1024),
StandaloneDatabaseProvider(context)
)
fun buildDataSourceFactory(): DataSource.Factory {
return CacheDataSource.Factory()
.setCache(cache)
.setUpstreamDataSourceFactory(DefaultHttpDataSource.Factory())
.setFlags(CacheDataSource.FLAG_IGNORE_CACHE_ON_ERROR)
}
}
val player = ExoPlayer.Builder(context)
.setMediaSourceFactory(
DefaultMediaSourceFactory(cacheManager.buildDataSourceFactory())
)
.build()
The LeastRecentlyUsedCacheEvictor evicts the oldest cached tracks when the cache exceeds 500 MB. This is separate from explicit downloads — cached tracks get evicted when space is needed, downloaded tracks stay until the user removes them. 500 MB stores roughly 100-150 songs at 320 kbps. For predictive prefetch, you could cache the first 30 seconds of the next few tracks in the queue so playback starts instantly even before ExoPlayer’s built-in pre-buffering kicks in.
Android provides Equalizer, BassBoost, Virtualizer, and LoudnessEnhancer through the android.media.audiofx package. These attach to an audio session ID, which ExoPlayer exposes.
class AudioEffectsManager(player: ExoPlayer) {
private val sessionId = player.audioSessionId
private val equalizer = Equalizer(0, sessionId).apply {
enabled = true
}
fun setPreset(presetIndex: Short) {
equalizer.usePreset(presetIndex)
}
fun setBandLevel(band: Short, level: Short) {
equalizer.setBandLevel(band, level)
}
fun release() {
equalizer.release()
}
}
The Equalizer has a fixed number of bands (typically 5), each with a frequency center and a gain range. You can use built-in presets (Rock, Pop, Jazz) or let the user adjust bands manually. Save the user’s EQ settings per profile in SharedPreferences and reapply them when the player is created. The audio effects only work when the player has an active audio session — create them after player.prepare() and release them when the player is released.
When the user connects Bluetooth headphones, unplugs wired headphones, or connects to a car, the app must respond correctly. Android sends ACTION_AUDIO_BECOMING_NOISY when headphones disconnect — you must pause playback to avoid blasting through the speaker.
class NoisyReceiver(private val player: ExoPlayer) : BroadcastReceiver() {
override fun onReceive(context: Context, intent: Intent) {
if (intent.action == AudioManager.ACTION_AUDIO_BECOMING_NOISY) {
player.pause()
}
}
}
For Bluetooth headset buttons (play/pause, skip), MediaSession handles it automatically — the system routes media button events to the active session. For Bluetooth car displays, the MediaSession publishes track metadata through AVRCP automatically. For Android Auto, MediaSessionService already provides the integration — Auto connects as a MediaController and displays the queue and controls. For Cast (Chromecast), you’d add CastPlayer from the Cast SDK — it implements the same Player interface as ExoPlayer, so you can swap the active player and the rest of the app (session, notification, UI) works unchanged.
Search needs to be fast and handle partial input. Show local results instantly from cached metadata while the network request is in flight. Debounce the search input by 300ms to avoid flooding the API with keystrokes.
class SearchViewModel(
private val musicRepository: MusicRepository
) : ViewModel() {
private val _query = MutableStateFlow("")
val results: StateFlow<SearchResult> = _query
.debounce(300)
.filter { it.length >= 2 }
.flatMapLatest { query ->
musicRepository.search(query)
}
.stateIn(viewModelScope, SharingStarted.Lazily, SearchResult.Empty)
}
The repository emits local matches first (tracks, artists, albums cached in Room), then appends server results when they arrive. For recommendations, the server does the heavy lifting — collaborative filtering, listening history analysis, genre graphs. The client’s job is to fetch and display personalized playlists (like Discover Weekly) and show “similar tracks” or “fans also like” sections on artist pages. Cache recommendation results aggressively since they only update daily or weekly.