7.3.2 Removing Jitter at the Receiver for Audio

Removing Jitter at the Receiver for Audio

For a voice application the receiver should attempt to provide synchronous playout of voice chunks in the presence of random network jitter. This is typically done by combining the following three mechanisms:

Prefacing each chunk with a sequence number.
Prefacing each chunk with a timestamp.
Delaying playout of chunks at the receiver.

These three mechanisms, when combined, can alleviate or even eliminate the effects of jitters. Examine two playback strategies: fixed play-out delay and adaptive play-out delay.

Fixed Playout Delay

With the fixed-delay strategy, the receiver attempts to play out each chunk exactly q msecs after the chunk is generated, If a chunk is timestamped at time t, the receiver plays out the chunk at time t+q. Packets that arrive after their scheduled playout times are discarded and considered lost.

Adaptive Playout Delay

The example above demonstrates an important delay-loss trade-off that arises when designing a playout strategy with fixed playout delays. By making the initial playout delay large, most packets will make their deadlines and there will therefore be negligible loss.

The natural way to deal with this trade-off is to estimate the network delay and the variance of the network delay, and to adjust the playout delay according at the beginning of each talk spurt.