Composite Media Operations
←Basic Media Operations · Index↑ · Media Production Chains→
This document examines how a set of basic media operations might typically be joined together to achieve a particular outcome or to model a physical piece of equipment.
Each “system” is analysed in order to show where, and how, identity and timing must be represented. It is then refined to show how the identity and timing representation can be optimised to avoid its handling becoming too onerous.
Terminology
To ease explanation of how identity and timing apply to composite operations some “physical” entities are referred to using modelling terminology from outside of this specification. These are as follows:
- Device: A logical or physical grouping of processing capability.
- Sender: A logical entity representing the process which encapsulates media for IP streaming.
- Receiver: A logical entity representing the process which decapsulates media from an IP link.
Optimising Composite Media Operations
When content is made available to other users or systems outside the scope of a single “black box”, it is critical to ensure that its identity and timing is represented correctly. The simplest way to achieve this is to generate new identity and timing data at the output of each functional block in the system following the guide in the previous document. The downside to this approach is that systems which comprise many hundreds of these granular operations will have to signal vast numbers of new Source
s and Flow
s.
In order to reduce the number of Source
s and Flow
s that a given system must advertise, the graph of identity and timing related operations can be simplified. This is achieved as follows:
- For each output which a given system can make: advertise its
Flow
and theFlow
’sSource
. - For each internal
Source
in the system which could at some future time be used as a reference to perform a control operation (for example sub-dividing a mux or splitting audio channels): advertise thisSource
. - For each internal
Flow
in the system which could at some future time be sent as an output from the system (potentially by changing the system’s internal routing): advertise thisFlow
and itsSource
.
This approach will be highlighted in the following examples. In each case a functional view of a system is shown. Each functional block generates a new Source
and/or Flow
. Having analysed each system and optimised its identity graph the resultant Source
s and Flow
s are shown in green text, and the unnecessary Source
s and Flow
s are shown in red.
Where optimisations such as these are made, a single new Source
or Flow
will identify that multiple operations have occurred. It is important that this single identified entity still conveys the full set of operations which have occurred, such as changes to metadata attributes, Time Value
s and the Time Context
.
Worked Examples
The following set of worked examples are intended to help to show how identity and timing relates to some real world devices or processes.
The first example in this set is described in more detail to ensure the results of bringing together several basic operations are clear.
In each case where Time Value
s are generated, the device is fed with a wall clock reference which allows it to associate these Time Value
s with the content using a single Time Context
.
Key
Edge Capture Device
Primary Observation: This example demonstrates a device which generates multiple
Flow
s of the sameSource
.
This is a device sitting at the edge of the environment containing strong identity and timing. This might be at the boundary between analogue and digital, or at the boundary between a legacy digital format (incapable of identity and timing carriage) and a modern digital format.
Time Value
s are associated with content at the earliest opportunity, so in this case it occurs within the “SDI Input” functional block.
Functional View
The diagram below shows a block diagram of the functional components of an SDI to IP gateway device.
Each of the blocks in this diagram correspond to one of the granular media operations described in the previous document:
- Capture
- The “SDI Input” generates a
Source
and aFlow
as there was no identity carried within the SDI bitstream. TheFlow
contains a multiplex of video, audio and data content.
- The “SDI Input” generates a
- Frame Synchroniser / Rate Conforming
- The “Frame Sync” re-positions the content in time to ensure it is phase aligned with the system clock. As a result of this change in timing it generates a new
Source
and a newFlow
.
- The “Frame Sync” re-positions the content in time to ensure it is phase aligned with the system clock. As a result of this change in timing it generates a new
- Split
- The “De-Multiplexer” components split the Video, Audio and Data out of the SDI formatted bitstream. As a result of this change they generate a new
Source
and a newFlow
. - The “Audio Splitter” components split the (up to 16 channel) audio
Flow
generated by the “Audio De-Multiplexer” into smaller components. Each Audio Splitter generates a newSource
and a newFlow
as a result.
- The “De-Multiplexer” components split the Video, Audio and Data out of the SDI formatted bitstream. As a result of this change they generate a new
- Audio / Video Bit Depth Modification
- In the video chain, the bit depth is converted from 10-bit to 8-bit prior to encoding. This results in a new
Flow
.
- In the video chain, the bit depth is converted from 10-bit to 8-bit prior to encoding. This results in a new
- Chroma Subsampling Modification
- In the video chain, the chroma subsampling is modified from 4:2:2 to 4:2:0 prior to encoding. This results in a new
Flow
.
- In the video chain, the chroma subsampling is modified from 4:2:2 to 4:2:0 prior to encoding. This results in a new
- Encode / Decode
- In the video chain, the raw frames are encoded as H.264. This results in a new
Flow
.
- In the video chain, the raw frames are encoded as H.264. This results in a new
Optimised Logical View
In order to avoid an unnecessary proliferation of identity, this large set of Source
s and Flow
s can be optimised down to the following.
Note that each of the Source
and Flow
identifiers are re-defined as a result of the identity optimisation. For example, the raw video output which was previously identified as Flow
C1 above is now identified as Flow
A1.
In this diagram, only the essential Source
s and Flow
s are identified. These were previously highlighted in green in the functional block diagram. The Source
s and Flow
s which are highlighted in red in the functional diagram are unnecessary as these remain internal to the device for all time.
Codec
Primary Observation: This example demonstrates a device which generates a
Flow
associated with a pre-existingSource
.
This is a device which consumes a network stream containing raw media and generates a new network stream containing the same content but in a compressed form. In this example, in order to produce an 8-bit 4:2:0 H.264 output, the input video must first have its bit depth and chroma subsampling modified, however no timing changes are required.
It is important to note that there is no change to the Source
identity as a result of this operation. This mirrors what would happen if the encode operation is performed in the device creating the raw video Flow
.
Functional View
The diagram below shows a block diagram of the functional components of an H.264 video encoder.
Optimised Logical View
In order to avoid an unnecessary proliferation of identity, this set of Flow
s can be optimised down to the following.
Content API
Primary Observation: This example demonstrates a device where no modifications are made to the identity or timing.
This is a device which allows content to be uploaded to it via an API, and then downloaded again via a separate (or the same) API. This device makes no modifications to the content but allows it to be stored for later consumption. A change to the content’s byte packing might be required in order to store it, but it will always be returned to the user in the same form that it was uploaded in allowing it to maintain the same identity.
Note that whilst APIs would typically not be characterised as Senders or Receivers, the same coloured blocks are used here in order to signify the direction in which content is moving.
Functional View
The diagram below shows a block diagram of the functional components of a content API and its backing storage device.
Optimised Logical View
There is no generation of identity in this example, so the optimised view produces the same identifiers as the original functional view.
Playout Server
Primary Observation: This example demonstrates why timing data must be changed during live content playout, and as such why this requires a new
Source
ID.
This is a device which can consume streamed content and store it, before playing it back out as a live stream at a later time. Unlike a plain storage and retrieval server, a playout server will typically need to re-stamp the Time Value
s used by the content so that it will correctly synchronise with other content generated in the live environment.
Note that this is a simple example of a playout server. There might be an additional requirement to pause, rewind or skip periods of time in a real world device. These actions might add additional functional blocks but could likely be optimised down to the same logical view shown below.
Functional View
The diagram below shows a block diagram of the functional components of a playout server and its backing storage device.
Optimised Logical View
In this case the optimisation does not reduce the number of identifiers because of the simplicity of the device.
Audio Matrix
Primary Observation: This example demonstrates how identity must change as audio channels are split or combined.
This is a device which can consume one or more audio streams containing multiple audio channels. The device can then re-arrange these channels to create multiple different output streams which comprise a mixture of the available input channels. No timing modifications are made to the content whilst this operation is performed so all input streams are expected to be well time-aligned.
Functional View
The diagram below shows a block diagram of the functional components of an audio matrix device.
Optimised Logical View
In this case the optimisation does not reduce the number of identifiers because of the simplicity of the device.
Frame Rate Converter
Primary Observation: This example demonstrates how
Time Value
s can be modified in one specific case without resulting in a newSource
ID.
This is a device which can consume a video stream at one frame rate and produce an output at a different frame rate. This output rate might be achieved by dropping frames in a simple case (such as 50Hz to 25Hz), or by interpolating between frames (such as 29.94Hz to 25Hz). This will result in the generation of new Time Value
s which are either a subset of the input Time Value
s or are generated by interpolating between the input Time Value
s.
In this example it is critical that the duration of the output content remains the same (that is, no time stretch or squash is applied). If a technique of this form is applied then a new Source
ID must be generated as the content’s relationship to time has been modified.
Functional View
The diagram below shows a block diagram of the functional components of a basic frame rate conversion device.
Optimised Logical View
In this case the optimisation does not reduce the number of identifiers because of the simplicity of the device.
Audio Mixer
Primary Observation: This example demonstrates how a more complex device should persist or generate fresh identifiers and
Time Value
s.
A device which can consume multiple audio streams, apply processing to them, and then combine several of these audio inputs together to produce a new output.
Time Value
s are associated with content at the earliest opportunity. For example, where audio is mixed in a bus Time Value
s are generated for the new content as the mix is created.
Functional View
The diagram below shows a block diagram of the functional components of a simple audio mixing device.
Optimised Logical View
In order to avoid an unnecessary proliferation of identity, this large set of Source
s and Flow
s can be optimised down to the following:
If the audio mixer had the capability to send individual channels’ feeds outside of the mixer for further processing (that is, an insert) then additional Source
s and Flow
s might need to be identified. For example, if a feed were sent following the EQ step then the EQ step’s Source
and Flow
would need to be explicitly identified.
Vision Switcher
Primary Observation: This example demonstrates how a more complex device should persist or generate fresh identifiers and
Time Value
s.
This is a device which can consume multiple video streams, apply processing to them, and then combine several of these video inputs together in order to produce a new output.
Time Value
s are associated with content at the earliest opportunity. For example, where video is mixed in a bus Time Value
s are generated for this new content as the mix is created.
Functional View
The diagram below shows a block diagram of the functional components of a simple vision switching device.
Optimised Logical View
In this case the optimisation does not reduce the number of identifiers because of the simplicity of the device.
Note that whilst the pattern generator’s feed is not sent out of the vision switcher, it is still explicitly identified so that the output video Source
can have a comprehensive ancestry tracked.
Non-Linear Editor
Primary Observation: This example demonstrates how non-live workflows have strong parallels with each of the previously noted live working examples. However, their ability to create many more revisions of the same output result in a requirement to generate new identifiers more frequently.
This is a software application which is operated on a generic computer. This application can import video files before presenting their contents to the user. The user can perform various operations on this media content such as crops, time stretches (slow-mo), keying, applying overlays etc as-per the previous document.
The act of performing each action will typically result in new identity for the resultant content, which might require a new Source
ID and/or Flow
ID.
Functional View
The diagram below shows a simple block diagram of the functional components of a non-linear editor. The actions performed on the media are distilled into the “Edit Decisions” block because a vast array of different actions can be performed over the duration of the individual pieces of input content.
The first time content is exported from the edit operation, the output Source
ID could be “D”, however if an edit decision is then changed, the Source
ID of the next export must also change to reflect this (for example to “E”). For this reason the Source
ID is noted in the diagrams as being uncertain (or “?”) as it may change with successive exports.
Optimised Logical View
In order to avoid unnecessary complexity, only the finished output of the video edit process is assigned new identity. Typically every time an export and outgest operation is performed the application must be assign the output content a new Source
ID and a new Flow
ID. This is required as each different edit will likely modify the content’s relationship to time, or change the information which is contained within the output content. The only exception to this rule is where two versions of the same edit are exported. For example, one in raw video and one in H.264 – these are permitted to be Flow
s of the same Source
.