Composite Media Operations

←Basic Media Operations · Index↑ · Media Production Chains→

This document examines how a set of basic media operations might typically be joined together to achieve a particular outcome or to model a physical piece of equipment.

Each “system” is analysed in order to show where, and how, identity and timing must be represented. It is then refined to show how the identity and timing representation can be optimised to avoid its handling becoming too onerous.

Terminology

To ease explanation of how identity and timing apply to composite operations some “physical” entities are referred to using modelling terminology from outside of this specification. These are as follows:

Device: A logical or physical grouping of processing capability.
Sender: A logical entity representing the process which encapsulates media for IP streaming.
Receiver: A logical entity representing the process which decapsulates media from an IP link.

Optimising Composite Media Operations

When content is made available to other users or systems outside the scope of a single “black box”, it is critical to ensure that its identity and timing is represented correctly. The simplest way to achieve this is to generate new identity and timing data at the output of each functional block in the system following the guide in the previous document. The downside to this approach is that systems which comprise many hundreds of these granular operations will have to signal vast numbers of new Sources and Flows.

In order to reduce the number of Sources and Flows that a given system must advertise, the graph of identity and timing related operations can be simplified. This is achieved as follows:

For each output which a given system can make: advertise its Flow and the Flow’s Source.
For each internal Source in the system which could at some future time be used as a reference to perform a control operation (for example sub-dividing a mux or splitting audio channels): advertise this Source.
For each internal Flow in the system which could at some future time be sent as an output from the system (potentially by changing the system’s internal routing): advertise this Flow and its Source.

This approach will be highlighted in the following examples. In each case a functional view of a system is shown. Each functional block generates a new Source and/or Flow. Having analysed each system and optimised its identity graph the resultant Sources and Flows are shown in green text, and the unnecessary Sources and Flows are shown in red.

Where optimisations such as these are made, a single new Source or Flow will identify that multiple operations have occurred. It is important that this single identified entity still conveys the full set of operations which have occurred, such as changes to metadata attributes, Time Values and the Time Context.

Worked Examples

The following set of worked examples are intended to help to show how identity and timing relates to some real world devices or processes.

The first example in this set is described in more detail to ensure the results of bringing together several basic operations are clear.

In each case where Time Values are generated, the device is fed with a wall clock reference which allows it to associate these Time Values with the content using a single Time Context.

Key

Key

Edge Capture Device

Primary Observation: This example demonstrates a device which generates multiple Flows of the same Source.

This is a device sitting at the edge of the environment containing strong identity and timing. This might be at the boundary between analogue and digital, or at the boundary between a legacy digital format (incapable of identity and timing carriage) and a modern digital format.

Time Values are associated with content at the earliest opportunity, so in this case it occurs within the “SDI Input” functional block.

Functional View

The diagram below shows a block diagram of the functional components of an SDI to IP gateway device.

SDI Gateway Device: Functional View

Each of the blocks in this diagram correspond to one of the granular media operations described in the previous document:

Capture
- The “SDI Input” generates a Source and a Flow as there was no identity carried within the SDI bitstream. The Flow contains a multiplex of video, audio and data content.
Frame Synchroniser / Rate Conforming
- The “Frame Sync” re-positions the content in time to ensure it is phase aligned with the system clock. As a result of this change in timing it generates a new Source and a new Flow.
Split
- The “De-Multiplexer” components split the Video, Audio and Data out of the SDI formatted bitstream. As a result of this change they generate a new Source and a new Flow.
- The “Audio Splitter” components split the (up to 16 channel) audio Flow generated by the “Audio De-Multiplexer” into smaller components. Each Audio Splitter generates a new Source and a new Flow as a result.
Audio / Video Bit Depth Modification
- In the video chain, the bit depth is converted from 10-bit to 8-bit prior to encoding. This results in a new Flow.
Chroma Subsampling Modification
- In the video chain, the chroma subsampling is modified from 4:2:2 to 4:2:0 prior to encoding. This results in a new Flow.
Encode / Decode
- In the video chain, the raw frames are encoded as H.264. This results in a new Flow.

Optimised Logical View

In order to avoid an unnecessary proliferation of identity, this large set of Sources and Flows can be optimised down to the following.

Note that each of the Source and Flow identifiers are re-defined as a result of the identity optimisation. For example, the raw video output which was previously identified as Flow C1 above is now identified as Flow A1.

SDI Gateway Device: Logical View

In this diagram, only the essential Sources and Flows are identified. These were previously highlighted in green in the functional block diagram. The Sources and Flows which are highlighted in red in the functional diagram are unnecessary as these remain internal to the device for all time.

Codec

Primary Observation: This example demonstrates a device which generates a Flow associated with a pre-existing Source.

This is a device which consumes a network stream containing raw media and generates a new network stream containing the same content but in a compressed form. In this example, in order to produce an 8-bit 4:2:0 H.264 output, the input video must first have its bit depth and chroma subsampling modified, however no timing changes are required.

It is important to note that there is no change to the Source identity as a result of this operation. This mirrors what would happen if the encode operation is performed in the device creating the raw video Flow.

Functional View

The diagram below shows a block diagram of the functional components of an H.264 video encoder.

Codec: Functional View

Optimised Logical View

In order to avoid an unnecessary proliferation of identity, this set of Flows can be optimised down to the following.

Codec: Logical View

Content API

Primary Observation: This example demonstrates a device where no modifications are made to the identity or timing.

This is a device which allows content to be uploaded to it via an API, and then downloaded again via a separate (or the same) API. This device makes no modifications to the content but allows it to be stored for later consumption. A change to the content’s byte packing might be required in order to store it, but it will always be returned to the user in the same form that it was uploaded in allowing it to maintain the same identity.

Note that whilst APIs would typically not be characterised as Senders or Receivers, the same coloured blocks are used here in order to signify the direction in which content is moving.

Functional View

The diagram below shows a block diagram of the functional components of a content API and its backing storage device.

Content API: Functional View

Optimised Logical View

There is no generation of identity in this example, so the optimised view produces the same identifiers as the original functional view.

Content API: Logical View

Playout Server

Primary Observation: This example demonstrates why timing data must be changed during live content playout, and as such why this requires a new Source ID.

This is a device which can consume streamed content and store it, before playing it back out as a live stream at a later time. Unlike a plain storage and retrieval server, a playout server will typically need to re-stamp the Time Values used by the content so that it will correctly synchronise with other content generated in the live environment.

Note that this is a simple example of a playout server. There might be an additional requirement to pause, rewind or skip periods of time in a real world device. These actions might add additional functional blocks but could likely be optimised down to the same logical view shown below.

Functional View

The diagram below shows a block diagram of the functional components of a playout server and its backing storage device.

Playout Server: Functional View

Optimised Logical View

In this case the optimisation does not reduce the number of identifiers because of the simplicity of the device.

Playout Server: Logical View

Audio Matrix

Primary Observation: This example demonstrates how identity must change as audio channels are split or combined.

This is a device which can consume one or more audio streams containing multiple audio channels. The device can then re-arrange these channels to create multiple different output streams which comprise a mixture of the available input channels. No timing modifications are made to the content whilst this operation is performed so all input streams are expected to be well time-aligned.

Functional View

The diagram below shows a block diagram of the functional components of an audio matrix device.

Audio Matrix: Functional View

Optimised Logical View

In this case the optimisation does not reduce the number of identifiers because of the simplicity of the device.

Audio Matrix: Logical View

Frame Rate Converter

Primary Observation: This example demonstrates how Time Values can be modified in one specific case without resulting in a new Source ID.

This is a device which can consume a video stream at one frame rate and produce an output at a different frame rate. This output rate might be achieved by dropping frames in a simple case (such as 50Hz to 25Hz), or by interpolating between frames (such as 29.94Hz to 25Hz). This will result in the generation of new Time Values which are either a subset of the input Time Values or are generated by interpolating between the input Time Values.

In this example it is critical that the duration of the output content remains the same (that is, no time stretch or squash is applied). If a technique of this form is applied then a new Source ID must be generated as the content’s relationship to time has been modified.

Functional View

The diagram below shows a block diagram of the functional components of a basic frame rate conversion device.

Frame Rate Converter: Functional View

Optimised Logical View

In this case the optimisation does not reduce the number of identifiers because of the simplicity of the device.

Frame Rate Converter: Logical View

Audio Mixer

Primary Observation: This example demonstrates how a more complex device should persist or generate fresh identifiers and Time Values.

A device which can consume multiple audio streams, apply processing to them, and then combine several of these audio inputs together to produce a new output.

Time Values are associated with content at the earliest opportunity. For example, where audio is mixed in a bus Time Values are generated for the new content as the mix is created.

Functional View

The diagram below shows a block diagram of the functional components of a simple audio mixing device.

Audio Mixer: Functional View

Optimised Logical View

In order to avoid an unnecessary proliferation of identity, this large set of Sources and Flows can be optimised down to the following:

Audio Mixer: Logical View

If the audio mixer had the capability to send individual channels’ feeds outside of the mixer for further processing (that is, an insert) then additional Sources and Flows might need to be identified. For example, if a feed were sent following the EQ step then the EQ step’s Source and Flow would need to be explicitly identified.

Vision Switcher

Primary Observation: This example demonstrates how a more complex device should persist or generate fresh identifiers and Time Values.

This is a device which can consume multiple video streams, apply processing to them, and then combine several of these video inputs together in order to produce a new output.

Time Values are associated with content at the earliest opportunity. For example, where video is mixed in a bus Time Values are generated for this new content as the mix is created.

Functional View

The diagram below shows a block diagram of the functional components of a simple vision switching device.

Vision Switcher: Functional View

Optimised Logical View

In this case the optimisation does not reduce the number of identifiers because of the simplicity of the device.

Note that whilst the pattern generator’s feed is not sent out of the vision switcher, it is still explicitly identified so that the output video Source can have a comprehensive ancestry tracked.

Vision Switcher: Logical View

Non-Linear Editor

Primary Observation: This example demonstrates how non-live workflows have strong parallels with each of the previously noted live working examples. However, their ability to create many more revisions of the same output result in a requirement to generate new identifiers more frequently.

This is a software application which is operated on a generic computer. This application can import video files before presenting their contents to the user. The user can perform various operations on this media content such as crops, time stretches (slow-mo), keying, applying overlays etc as-per the previous document.

The act of performing each action will typically result in new identity for the resultant content, which might require a new Source ID and/or Flow ID.

Functional View

The diagram below shows a simple block diagram of the functional components of a non-linear editor. The actions performed on the media are distilled into the “Edit Decisions” block because a vast array of different actions can be performed over the duration of the individual pieces of input content.

The first time content is exported from the edit operation, the output Source ID could be “D”, however if an edit decision is then changed, the Source ID of the next export must also change to reflect this (for example to “E”). For this reason the Source ID is noted in the diagrams as being uncertain (or “?”) as it may change with successive exports.

Non-Linear Editor: Functional View

Optimised Logical View

In order to avoid unnecessary complexity, only the finished output of the video edit process is assigned new identity. Typically every time an export and outgest operation is performed the application must be assign the output content a new Source ID and a new Flow ID. This is required as each different edit will likely modify the content’s relationship to time, or change the information which is contained within the output content. The only exception to this rule is where two versions of the same edit are exported. For example, one in raw video and one in H.264 – these are permitted to be Flows of the same Source.

Non-Linear Editor: Logical View

←Basic Media Operations · Index↑ · Media Production Chains→

Documentation built at 04:51:55 CDT on 2024-08-29.
(c) AMWA 2024. RAML/JSON licensed under Apache 2.0. Documentation licensed under CC BY-ND 4.0.