Media Production Chains
←Composite Media Operations · Index↑ · Appendix - Commentary→
This document considers larger production systems constructed from various individual devices.
The focus of this document is on timing and how the use of Time Value
s created and carried by different pieces of equipment can be used to achieve synchronisation. In addition, it examines how Time Value
s should be handled at specific system boundaries in order to achieve the expected outcomes.
Terminology
In order to ease explanation of how identity and timing apply to production chains, some common terms of reference are used. These are as follows:
- Wall Clock: A timekeeping device which provides a consistent source of time to all media processing devices on the network.
- Presentation Clock: A clock held within a media processing device which drives the output of media samples for presentation purposes. This might be calculated with reference to the Wall Clock and/or other inputs.
Worked Examples
Key
Simple Live Video & Audio
Objective
Synchronise audio and video for display after passage through independent processing chains.
In order to demonstrate the effects of delay in the processing chain the following delay figures are assumed:
- Camera imposes 40ms of delay
- Vision Mixer imposes 40ms of delay
- Display imposes 40ms of delay
- Microphone imposes 2.5ms of delay
- Audio Mixer imposes 7.5ms of delay
- Loudspeaker imposes 2.5ms of delay
Description
Video and audio are captured simultaneously by a camera and microphone respectively. The camera and microphone have access to a source of Wall Clock time which they use to generate Time Value
s for each video frame and audio sample captured. These Time Value
s are attached to their corresponding samples and persist with the content through the processing chain.
Due to a higher latency in the video chain than the audio chain, the video is displayed late when compared with the audio.
Each output device (display and loudspeaker) is fed with a source of Wall Clock time. By comparing the Wall Clock value with the Time Value
of the incoming video or audio samples, a time difference can be calculated at each output device which corresponds to the latency incurred through the chain (including its own processing latency). By taking the largest of these calculated differences, a Presentation Clock can be derived as follows:
Max Offset = MAX(Wall Clock - Time Value, Wall Clock - Time Value,...)
Presentation Clock = Wall Clock + Max Offset
Each output device can then be programmed with this Max Offset value in order to derive its own Presentation Clock. The device must then buffer content as it arrives until the Presentation Clock equals the content’s Time Value
. At this point the content can be released from the buffer and output synchronisation is achieved.
Timing Diagram
Samples are captured by the camera and microphone at time zero on the diagram. The loudspeaker then has a large (107.5ms) delay imposed upon its Presentation Clock to ensure it outputs audio samples in time with the display.
Note that as shown above, the audio mix is able to happen before the corresponding video samples are mixed given the level of delay imposed by the vision chain (mixes start at 2.5ms and 40ms respectively). By using end-to-end timing it is unnecessary to keep vision and audio in sync throughout the chain as long as the Flow
s which are actually being combined are in sync with each other.
Live Studio Plus Outside Broadcast
Objective
Synchronise audio and video for display after passage through independent processing chains. In addition, audio and video from a remote site must be mixed and presented in sync.
In order to demonstrate the effects of delay in the processing chain the following delay figures are assumed:
- Cameras impose 40ms of delay
- Vision Mixer imposes 40ms of delay
- Display imposes 40ms of delay
- Microphones impose 2.5ms of delay
- Audio Mixer imposes 7.5ms of delay
- Loudspeaker imposes 2.5ms of delay
- Outside Broadcast network transit imposes 120ms of delay
Description
The Studio and Outside Broadcast share a common time reference through the use of GPS or a similar mechanism. Each captured video frame or audio sample has a Time Value
attached to it which is derived from a Wall Clock reference in each location.
Studio-based audio and video pass directly to the audio and vision mixing processes respectively. Audio and video from the remote location incur additional latency prior to reaching the mixing stage due to the speed of light between the two locations.
In order to permit mixing in sync, the Outside Broadcast Flow
s must undergo re-timing as they enter the Studio environment. This re-timing action results in the generation of a new Source
ID and new Flow
ID for the content before it is mixed.
Once mixing has been achieved, the audio and video must be presented in sync via a display and loudspeaker using the Presentation Clock mechanism defined in the previous example.
Timing Diagram
Two timing diagrams are shown below in order to illustrate what would happen with and without the re-timing component at the Studio boundary.
The timing diagram above illustrates the system timing when no re-timing is performed. In this case, when the Outside Broadcast camera captures a video frame with a Time Value
of zero, the Studio camera also captures a video frame with a Time Value
of zero. Due to the network link latency to the Outside Broadcast site, the Studio’s camera Flow
must be delayed until the corresponding video frame from the Outside Broadcast arrives and can be mixed in sync.
It is important to note that the Studio and Outside Broadcast cameras cannot be mixed out of sync for two reasons:
- It would be unclear which
Time Value
the vision mixer should persist into its outputFlow
if its inputs differed. - By mixing
Flow
s out of sync, the relative relationship between the Outside Broadcast’s video and audioFlow
s would be lost.
The major issue with this approach is that the Studio’s timing becomes strongly coupled to the Outside Broadcast. If a second Outside Broadcast were started mid-way through a programme and this had an even longer network latency then the Studio’s vision and audio mix processes would have to be further delayed resulting in repeated frames and audio glitches.
The solution to the previous timing problem is to introduce a re-timing component at the Studio edge. This generates new Time Value
s for the content arriving from the Outside Broadcast using the Studio’s Time Context
. By doing this, the video frame which was given a Time Value
of zero at the Outside Broadcast location is re-written to a Time Value
of 120. This allows it to be synchronised immediately with the next available frame from the Studio camera. By modifying the Outside Broadcast’s video and audio samples by exactly the same period of time their relatively synchronisation is maintained.
An alternative approach to the external re-timing device would be to perform this re-timing operation within the vision and audio mixers. In this approach these independent devices must be carefully co-ordinated to ensure that the time offset applied to the vision and audio from the Outside Broadcast is identical.
Live Studios Feeding Playout Gallery
Objective
Synchronise audio and video for display after passage through two independent studios, followed by a presentation mix chain. Note that in order to simplify the explanation, only the vision chain is shown in the timing diagram below.
In order to demonstrate the effects of delay in the processing chain the following delay figures are assumed:
- Cameras impose 40ms of delay
- Studio A Vision Mixer imposes 40ms of delay
- Studio B Vision Mixer imposes 80ms of delay
- Playout Gallery Vision Mixer imposes 40ms of delay
Description
Video and audio are captured simultaneously by a camera and microphone respectively in each studio. The camera and microphone have access to a source of Wall Clock time which they use to generate Time Value
s for each video frame and audio sample captured. These Time Value
s are attached to their corresponding samples and persist with the content through the processing chain.
The Studio A vision and audio arrive in the Playout Gallery before the corresponding samples from Studio B – this is due to a higher latency in Studio B’s processing chain.
In order to permit mixing in sync, the input Flow
s from the Studios must undergo re-timing as they enter the Playout Gallery environment. This re-timing action results in the generation of a new Source
ID and new Flow
ID for the content before it is mixed.
Once mixing has been achieved, the audio and video must be presented in sync via a display and loudspeaker using the Presentation Clock mechanism defined in the first example.
Timing Diagram
The diagram above illustrates the re-timing action required in this scenario – it is broadly similar to the re-timing performed in the Outside Broadcast scenario (refer to the second timing diagram in the Outside Broadcast scenario which covers the use of the re-timing component). A key difference here is that the two Studios and the Playout Gallery might be operating in the same Time Context
with access to the same Wall Clock. This is a key example of end-to-end timing only being relevant within certain bounds.
The re-timing devices in this example re-write the Time Value
s arriving from the Studios (whatever these values are) to approximately the current Wall Clock value. This makes them most appropriate for consumption by the next device in the chain. As with the Outside Broadcast scenario it is important to ensure that the relative timing relationship between the video and audio from each Studio is maintained.
As in the Outside Broadcast scenario, without this re-timing action the Flow
arriving earliest must be delayed to wait for the Flow
from the other Studio thereby delaying the Playout Gallery output. This would mean that the Playout Gallery timing becomes strongly coupled to the latencies of the processing chains in the Studios; therefore it would be problematic for the Playout Gallery to accept an input from an additional Studio with a higher latency processing chain.
Remote Production
Objective
Capture video at an Outside Broadcast location in high quality (HQ) and low quality (Proxy) versions. Produce a mix based upon the low quality version at a Studio facility, before instructing the Outside Broadcast site to re-produce this mix precisely and forward it on to the Studio.
Note: The intention is to avoid sending each individual high quality camera Flow
back to the Studio but instead send back individual low quality versions and a single high quality finished programme output.
In order to demonstrate the effects of delay in the processing chain the following delay figures are assumed:
- Cameras impose 40ms of delay
- Vision Mixers impose 40ms of delay
- Displays impose 40ms of delay
- Outside Broadcast network transit imposes 20ms of delay in both directions
Description
Video is captured simultaneously by Outside Broadcast cameras. The cameras have access to a source of Wall Clock time which they use to generate a Time Value
for each video frame captured. These Time Value
s are attached to their corresponding samples and persist with the content through the processing chain.
Low quality (Proxy) Flow
s are sent to the Studio facility in order to produce a mix, whilst high quality (HQ) Flow
s remain at the Outside Broadcast site for later processing. Once the edit decisions have been relayed from the Studio facility, the high quality Flow
s can be mixed in the same way and transmitted on.
Identity in this scenario is critical. The camera HQ and Proxy Flow
s share a Source
ID and Time Value
s which ensures that the Proxy edit can be re-created precisely at the Outside Broadcast location (using the HQ Flow
s) prior to being sent back to the Studio facility.
Timing Diagram
The diagram above shows that in this scenario it is unnecessary to modify any Time Value
s before they are used. However, at the Outside Broadcast location it is important to delay the HQ camera feeds from entering the vision mixer until the edit decisions on the low quality Proxy content have been relayed back from the Studio.
←Composite Media Operations · Index↑ · Appendix - Commentary→