Media Production Chains

←Composite Media Operations · Index↑ · Appendix - Commentary→

This document considers larger production systems constructed from various individual devices.

The focus of this document is on timing and how the use of Time Values created and carried by different pieces of equipment can be used to achieve synchronisation. In addition, it examines how Time Values should be handled at specific system boundaries in order to achieve the expected outcomes.

Terminology

In order to ease explanation of how identity and timing apply to production chains, some common terms of reference are used. These are as follows:

Wall Clock: A timekeeping device which provides a consistent source of time to all media processing devices on the network.
Presentation Clock: A clock held within a media processing device which drives the output of media samples for presentation purposes. This might be calculated with reference to the Wall Clock and/or other inputs.

Worked Examples

Key

Key

Simple Live Video & Audio

Live A/V

Objective

Synchronise audio and video for display after passage through independent processing chains.

In order to demonstrate the effects of delay in the processing chain the following delay figures are assumed:

Camera imposes 40ms of delay
Vision Mixer imposes 40ms of delay
Display imposes 40ms of delay
Microphone imposes 2.5ms of delay
Audio Mixer imposes 7.5ms of delay
Loudspeaker imposes 2.5ms of delay

Description

Video and audio are captured simultaneously by a camera and microphone respectively. The camera and microphone have access to a source of Wall Clock time which they use to generate Time Values for each video frame and audio sample captured. These Time Values are attached to their corresponding samples and persist with the content through the processing chain.

Due to a higher latency in the video chain than the audio chain, the video is displayed late when compared with the audio.

Each output device (display and loudspeaker) is fed with a source of Wall Clock time. By comparing the Wall Clock value with the Time Value of the incoming video or audio samples, a time difference can be calculated at each output device which corresponds to the latency incurred through the chain (including its own processing latency). By taking the largest of these calculated differences, a Presentation Clock can be derived as follows:

Max Offset = MAX(Wall Clock - Time Value, Wall Clock - Time Value,...)

Presentation Clock = Wall Clock + Max Offset

Each output device can then be programmed with this Max Offset value in order to derive its own Presentation Clock. The device must then buffer content as it arrives until the Presentation Clock equals the content’s Time Value. At this point the content can be released from the buffer and output synchronisation is achieved.

Timing Diagram

Live A/V Timing Diagram

Samples are captured by the camera and microphone at time zero on the diagram. The loudspeaker then has a large (107.5ms) delay imposed upon its Presentation Clock to ensure it outputs audio samples in time with the display.

Note that as shown above, the audio mix is able to happen before the corresponding video samples are mixed given the level of delay imposed by the vision chain (mixes start at 2.5ms and 40ms respectively). By using end-to-end timing it is unnecessary to keep vision and audio in sync throughout the chain as long as the Flows which are actually being combined are in sync with each other.

Live Studio Plus Outside Broadcast

Outside Broadcast

Objective

Synchronise audio and video for display after passage through independent processing chains. In addition, audio and video from a remote site must be mixed and presented in sync.

In order to demonstrate the effects of delay in the processing chain the following delay figures are assumed:

Cameras impose 40ms of delay
Vision Mixer imposes 40ms of delay
Display imposes 40ms of delay
Microphones impose 2.5ms of delay
Audio Mixer imposes 7.5ms of delay
Loudspeaker imposes 2.5ms of delay
Outside Broadcast network transit imposes 120ms of delay

Description

The Studio and Outside Broadcast share a common time reference through the use of GPS or a similar mechanism. Each captured video frame or audio sample has a Time Value attached to it which is derived from a Wall Clock reference in each location.

Studio-based audio and video pass directly to the audio and vision mixing processes respectively. Audio and video from the remote location incur additional latency prior to reaching the mixing stage due to the speed of light between the two locations.

In order to permit mixing in sync, the Outside Broadcast Flows must undergo re-timing as they enter the Studio environment. This re-timing action results in the generation of a new Source ID and new Flow ID for the content before it is mixed.

Once mixing has been achieved, the audio and video must be presented in sync via a display and loudspeaker using the Presentation Clock mechanism defined in the previous example.

Timing Diagram

Two timing diagrams are shown below in order to illustrate what would happen with and without the re-timing component at the Studio boundary.

Outside Broadcast Timing Diagram 1

The timing diagram above illustrates the system timing when no re-timing is performed. In this case, when the Outside Broadcast camera captures a video frame with a Time Value of zero, the Studio camera also captures a video frame with a Time Value of zero. Due to the network link latency to the Outside Broadcast site, the Studio’s camera Flow must be delayed until the corresponding video frame from the Outside Broadcast arrives and can be mixed in sync.

It is important to note that the Studio and Outside Broadcast cameras cannot be mixed out of sync for two reasons:

It would be unclear which Time Value the vision mixer should persist into its output Flow if its inputs differed.
By mixing Flows out of sync, the relative relationship between the Outside Broadcast’s video and audio Flows would be lost.

The major issue with this approach is that the Studio’s timing becomes strongly coupled to the Outside Broadcast. If a second Outside Broadcast were started mid-way through a programme and this had an even longer network latency then the Studio’s vision and audio mix processes would have to be further delayed resulting in repeated frames and audio glitches.

Outside Broadcast Timing Diagram 2

The solution to the previous timing problem is to introduce a re-timing component at the Studio edge. This generates new Time Values for the content arriving from the Outside Broadcast using the Studio’s Time Context. By doing this, the video frame which was given a Time Value of zero at the Outside Broadcast location is re-written to a Time Value of 120. This allows it to be synchronised immediately with the next available frame from the Studio camera. By modifying the Outside Broadcast’s video and audio samples by exactly the same period of time their relatively synchronisation is maintained.

An alternative approach to the external re-timing device would be to perform this re-timing operation within the vision and audio mixers. In this approach these independent devices must be carefully co-ordinated to ensure that the time offset applied to the vision and audio from the Outside Broadcast is identical.

Live Studios Feeding Playout Gallery

Live Playout

Objective

Synchronise audio and video for display after passage through two independent studios, followed by a presentation mix chain. Note that in order to simplify the explanation, only the vision chain is shown in the timing diagram below.

In order to demonstrate the effects of delay in the processing chain the following delay figures are assumed:

Cameras impose 40ms of delay
Studio A Vision Mixer imposes 40ms of delay
Studio B Vision Mixer imposes 80ms of delay
Playout Gallery Vision Mixer imposes 40ms of delay

Description

Video and audio are captured simultaneously by a camera and microphone respectively in each studio. The camera and microphone have access to a source of Wall Clock time which they use to generate Time Values for each video frame and audio sample captured. These Time Values are attached to their corresponding samples and persist with the content through the processing chain.

The Studio A vision and audio arrive in the Playout Gallery before the corresponding samples from Studio B – this is due to a higher latency in Studio B’s processing chain.

In order to permit mixing in sync, the input Flows from the Studios must undergo re-timing as they enter the Playout Gallery environment. This re-timing action results in the generation of a new Source ID and new Flow ID for the content before it is mixed.

Once mixing has been achieved, the audio and video must be presented in sync via a display and loudspeaker using the Presentation Clock mechanism defined in the first example.

Timing Diagram

Live Playout Timing Diagram

The diagram above illustrates the re-timing action required in this scenario – it is broadly similar to the re-timing performed in the Outside Broadcast scenario (refer to the second timing diagram in the Outside Broadcast scenario which covers the use of the re-timing component). A key difference here is that the two Studios and the Playout Gallery might be operating in the same Time Context with access to the same Wall Clock. This is a key example of end-to-end timing only being relevant within certain bounds.

The re-timing devices in this example re-write the Time Values arriving from the Studios (whatever these values are) to approximately the current Wall Clock value. This makes them most appropriate for consumption by the next device in the chain. As with the Outside Broadcast scenario it is important to ensure that the relative timing relationship between the video and audio from each Studio is maintained.

As in the Outside Broadcast scenario, without this re-timing action the Flow arriving earliest must be delayed to wait for the Flow from the other Studio thereby delaying the Playout Gallery output. This would mean that the Playout Gallery timing becomes strongly coupled to the latencies of the processing chains in the Studios; therefore it would be problematic for the Playout Gallery to accept an input from an additional Studio with a higher latency processing chain.

Remote Production

Objective

Capture video at an Outside Broadcast location in high quality (HQ) and low quality (Proxy) versions. Produce a mix based upon the low quality version at a Studio facility, before instructing the Outside Broadcast site to re-produce this mix precisely and forward it on to the Studio.

Note: The intention is to avoid sending each individual high quality camera Flow back to the Studio but instead send back individual low quality versions and a single high quality finished programme output.

In order to demonstrate the effects of delay in the processing chain the following delay figures are assumed:

Cameras impose 40ms of delay
Vision Mixers impose 40ms of delay
Displays impose 40ms of delay
Outside Broadcast network transit imposes 20ms of delay in both directions

Description

Video is captured simultaneously by Outside Broadcast cameras. The cameras have access to a source of Wall Clock time which they use to generate a Time Value for each video frame captured. These Time Values are attached to their corresponding samples and persist with the content through the processing chain.

Low quality (Proxy) Flows are sent to the Studio facility in order to produce a mix, whilst high quality (HQ) Flows remain at the Outside Broadcast site for later processing. Once the edit decisions have been relayed from the Studio facility, the high quality Flows can be mixed in the same way and transmitted on.

Identity in this scenario is critical. The camera HQ and Proxy Flows share a Source ID and Time Values which ensures that the Proxy edit can be re-created precisely at the Outside Broadcast location (using the HQ Flows) prior to being sent back to the Studio facility.

Timing Diagram

Remote Production Timing Diagram

The diagram above shows that in this scenario it is unnecessary to modify any Time Values before they are used. However, at the Outside Broadcast location it is important to delay the HQ camera feeds from entering the vision mixer until the edit decisions on the low quality Proxy content have been relayed back from the Studio.

←Composite Media Operations · Index↑ · Appendix - Commentary→

Documentation built at 04:51:55 CDT on 2024-08-29.
(c) AMWA 2024. RAML/JSON licensed under Apache 2.0. Documentation licensed under CC BY-ND 4.0.