Posted on

Audio broadcasting: What the hell

It’s true, streaming multimedia sources is a mess! One consequence of not standardized, as well as, poor standardized stuff is the uncontrolled proliferation of practical solutions that makes things incompatible. Multimedia streaming is not exempt, in the practice, there are uncountable solutions to stream an audio source. Although this is not a complete and exhaustive guide let’s try to clarify a bit with some examples.

Transport layer

Before opening a long discussion about audio codec formats, I wanna call to mind that audio streaming needs a transportation layer to deliver audio samples from a source to a destination. Typically, the TCP/IP stack that allows us to explorer Internet is used also to stream audio sources. Say that, TCP/IP has two main protocols into transportation layer TCP and UDP. To keep short, the former allows slow reliable streams, instead, the latter fast unordered delivery. In theory, TCP must be preferred to guaranteed the integrity of the information and the UDP for real-time transfers. However, in practice, audio streams use both solutions or a combination of both.

  • HTTP stream. A common solution consists in using a standard HTTP server that accepts GET requests and replies with a common header followed by the actual audio stream. HTTP stream turns web browsers into media players, however, listeners are not synced and each connection is a dedicated stream. All Broadcasters support HTTP stream, see a guide of how to configure VLC Media Player.
  • UDP legacy. Ideally perfect for local streams UDP seems to be the simplest solution with unlimited clients and a reduced delay, however, this method lacks in reliability, does not offer control mechanisms, and nowadays is rarely used.
  • RTP raw. To make things stronger, the RTP protocol introduces a variety of tricks to ensure a recoverability from packets lost. Typically, audio frames are encapsulated into RTP frames and the transferred with (low latency) UDP protocol. Moreover, RTP contains a timestamp and a sequence number that make listeners able to re-sync.
  • RTSP. RTSP is a network control protocol, typically, works in conjunction with RTCP and uses RTP as a streaming layer. More often, RTSP exposes a presentation description similar to the HTTP solution, where clients may register themselves. Contemporary, RTCP provides a feedback mechanism from receivers to the source that enables a dynamic adjustment of the bandwidth used.

Audio coding formats

A variety of audio coding formats have been developed to promote high fidelity and reduced size. In general, audio codecs belong to two main categories lossless and lossy. The former represents uncompressed and compressed format where is always possible to reconstruct the original information, similarly to ZIP archives. The latter includes all codecs that remove redundant information with the purpose of reducing the file size. Let’s list some example supported by our Broadcasters.

  • WAV consists of a multitude of lossless formats. Generically, a WAV file contains a RIFF header and a body where data may be stored without transformations. For example, the linear PCM stores data as-is in a linear format. However, WAV may contain samples that are not sampled linearly, as ADPCM, G.711 (µ-law, A-law), or G.722 ADPCM. Those techniques are particularly interesting for voice transmission.
  • MP3 is probably the most common coding format for digital audio. It is lossy, means that is not possible to reconstruct the original signal, or in other words, the compression loses information. Typically, the lost information is inaudible to the majority of the common ears.
  • Vorbis is conceptually similar to the MP3, with the difference that it has been developed to be royalty-free. However, even if Vorbis frames may exist alone, OGG container encapsulates Vorbis frames generating the common Ogg Vorbis format.
  • FLAC is a lossless compressed format. This means that the compression does not lose information similar to a ZIP archive. Unfortunately, although FLAC reduces the final size, it remains bigger than MP3 or Vorbis.
  • AAC is an advanced version of the MP3 coding format often used for HTTP streams in conjunction with IceCast or ShoutCast.

Particular consideration must be taken for OGG. It is not an audio codec but a multimedia container. OGG is often associated with Vorbis but it can encapsulate several formats. Many radio stations that use FLAC encapsulates FLAC frames into OGG frames.

What should I use?

Such as an Engineer, the answer is: depend!
Users have different exigences, therefore, a solution valid for everybody does not exist. However, there are common use cases that combine different needs.

  • One to many listeners. In many use cases, you don’t need a sync between listeners especially if those listeners are located all over the world. Moreover, you may prefer the flexibility to join and leave the stream. In this case, an HTTP stream distributed by a relay server running IceCast or ShoutCast may represent the perfect scenario. This case represents the perfect scenario for shop, museum, theaters themed listeners.
  • Synced listeners. Sometimes, you want to add many listeners in the same environment. In that case, an RTP or RTSP represents the best candidate. But be careful, making synced audio is an art! Although two devices may be in perfect sync, means they playback the same audio in the same instant, a human listener, may perceive a frustrating echo effect due to the delay introduced by a sound wave that leaves the woofer and arrives in the human ear. The Broadcasters have a fine-tunable delay to compensate for physical delay propagation.