The Showmax Engineering story, Part III.
Delivering content securely, with best possible user experience
On the Showmax Engineering blog, we try to explain solutions to a pretty specific problem: how to deliver a high quality video streaming in Africa. To make our stories more intelligible, we started the series The Showmax engineering story, with the goal of explaining the meanings of in-depth media engineering terms that we use so commonly.
Now, with the very last piece of vocabulary at hand, you have the tools to understand it all completely. If you missed the previous parts, you can read about encoding terms in The Showmax Engineering story, Part I, and streaming terms in The Showmax Engineering story, Part II. In this very last edition, we describe the terminology around content protection and quality evaluation.
Content protection-related terms
Let’s have a look behind the curtain of content protection against piracy.
Digital rights management (DRM) is a technology-based control over copyright-protected content, and it’s a fundamental part of the content protection environment. Its rules are defined within the contracts with content owners — in our case, that means contracts with studios. Basically, for every piece of content we have device constraints (what device we can display the content on and in which quality/resolution) and time constraints (how long we’re allowed to host it).
There are three main DRM types used in the market.
Widevine by Google is mainly supported by web browsers (Chrome, Firefox, Edge), Android devices, and SmartTV devices. Google provides a cloud solution for the license server, which is free to use, and includes no charges for license generation or device integration. It also provides an SDK to implement your own license server.
PlayReady by Microsoft is mainly supported by the Edge web browser, SmartTV devices and gaming consoles (Xbox, PlayStation). Microsoft provides the SDK to implement your own license server, or you can use any third party DRM provider that supports PlayReady.
FairPlay by Apple is mainly supported by the Safari web browser and iOS/tvOS devices, and was originally developed to protect music in the iTunes Store. Apple itself doesn’t provide any license server. You can either follow the guide on how to implement your own server, or use a third party DRM provider. Further details and specifications that go beyond the scope of our vocabulary can be found in the OTT Verse glossary.
DRM encryption algorithms
All three variants use a family of AES encryption algorithms, the difference is only in supported mode - full sample encryption in CTR mode (cenc), or pattern encryption in CBC mode (cbcs). This is defined as part of Common Encryption Scheme (CENC), which defines the standards for encryption and key mapping.
If DRM systems support CENC, content can be decrypted by any of these systems as long as the same key IDs and content keys are used. This allows content providers to serve the same content encrypted once to a broader range of devices using different DRM systems. It’s also called multiDRM.
To close this section, we need to mention two MPEG standards, MPEG Common Encryption, which defines encryption and container formats; and MPEG-A Part 19: Common media application format (CMAF) which handles conversion between them without re-encryption and re-encoding. Unfortunately, it does not define which AES mode should be used. While Apple decided to support “cbcs”, Widevine and PlayReady started with “cenc” and added “cbcs” mode later. Therefore, implementation of CMAF in practice is slow as you still need to have both variants to support older devices.
Technically the content is encrypted by a key, which is shared as part of the license that defines the conditions on which a device can use the key(s) to decrypt the content. Limited validity, requested security levels of the device, or the minimal supported HDCP are all examples of such conditions (a decryption key is also referred to as a content key). To find the right key, the device must provide a content ID (ID identifying a piece of content in the register of the DRM provider), and/or key ID (ID of the key used for content encryption) to the license server. Those values are usually baked into playlists/manifests, or directly in media segments. The License server then issues the actual license with the key to the device. The license is processed in a trusted execution environment (TEE) of the device — a secure area of the processor that ensures that the code and loaded data are protected, and provides hardware security for trusted applications.
Content decryption module (CDM) can be a software (compiled binary library) stored in a browser or hardware solution. The CDM decrypts the DRM-protected content using the key issued by the DRM license server.
Encrypted Media Extensions (EME) is specified as a communication layer between CDM and web browsers to play HTML5 video files protected by DRM. This obsoletes third party plugins like Adobe Flash or Microsoft Silverlight provided this functionality in the past.
DRM layers of protection
Security level defines the minimal level of DRM security requested, and specifies where the decryption can be processed. Levels of security can differ across DRMs, but in general, it can be defined as software DRM (less secure, usually up to SD quality) and hardware DRM (required for HD and higher).
|Security Level||Output protection||Note|
|SD (up to 576p)||L3 / SL2000 (Software)||-||Less strict DRM protection|
|HD (up to 720p/1080p)||L1 / SL3000 (Hardware)||Any HDCP||Stricter DRM protection|
|UHD (4K/8K)||L1 / SL3000 (Hardware)||HDCP v2.2+||Highest DRM protection (may require additional security like forensic watermarking)|
Security LevelOutput protectionNoteSD (up to 576p)L3 / SL2000 (Software)-Less strict DRM protectionHD (up to 720p/1080p)L1 / SL3000 (Hardware)Any HDCPStricter DRM protectionUHD (4K/8K)L1 / SL3000 (Hardware)HDCP v2.2+Highest DRM protection (may require additional security like forensic watermarking)
High-bandwidth content protection (HDCP) is a security measure used to transfer content to a display using compatible digital outputs including DisplayPort, DVI or HDMI. HDCP prevents unauthorized devices from reading high-quality content. Both the display and the digital outputs must be certified HW, and must support HDCP, otherwise the content is downgraded to a lower quality. If no HDCP is found on any side, it leads to a failure. HDCP can be restricted based on the content type (0 – any version, 1 – 2.0 and higher), or by version directly.
Fairplay and PlayReady use content type, and Widevine uses versioning.
Be aware that DRM encryption is not the same as what’s used for the HDCP encryption when sending the signal over the cable.
Forensic / digital watermarking
Forensic watermarking (or digital watermarking) is the practice of adding invisible markers into video tracks to uniquely identify authorized users. It can be understood as something like a signature. A simplified explanation is that, at the beginning of the packaging process we have two variants of the stream, one signed with A char, and the other with B char. The playlist is then composed as a unique sequence of segments from both streams. This signature can’t be removed by re-encoding the content, so it can be easily used for tracing leaked streams.
Insight into content delivery
Content delivery network is a set of servers distributed across the country or continent to increase the speed of content distribution. It usually consists of many powerful cache servers connected into layered/tree topology.
Origin servers contain all of the catalog content, and are therefore a kind of source of truth. They have massive disk capacity and don’t serve the content directly to customers — they are protected behind edge servers, or mid tier servers that sit between origins and edges. Mid tier/edge servers hold the most popular content in local cache with less capacity but more speed (RAM or quick disks like SSDs), in order to serve higher demand from popular content very quickly. The effectiveness of a given CDN architecture can be evaluated by cache-hit ratio. The higher the ratio, the more requests are served from the local caches (faster response) with fewer requests passed to lower layers (and, eventually, to origins). Such architectures, just at a larger scale, are also used by big third party CDN providers like Akamai, Limelight and Cloudfront.
You can find out more about the Showmax content delivery network in the CDN series.
Quality of experience
All of the terms mentioned within this series describe factors that influence the quality of our customers’ experience. Quality of experience (QoE), our very own metric, describes how our customers are satisfied with the product we provide them. It’s an attempt to objectively measure subjective opinion.
Some general QoE metrics worth mentioning:
Buffering time is the time when the player waits for data/content while playback is paused. A related value is re-buffering, which is calculated as the proportion of buffering time to consumed/played time. It can be influenced by an overloaded CDN, network issues between CDN and the device, and several other things. In general, lower video bitrates are less prone for buffering.
Another one is the number of failures during or before real playback that cause unrecoverable interruption. They can be caused by bugs on the backend or frontend, issues with the network between CDN and the device, damaged encoded video titles, and more.
Last but not least is average bitrate, which tells us the average video quality the user watched - which is often tightly-connected to buffering. Lower average bitrate usually means that viewers watched lower quality videos, which can be a precursor of buffering issues. It can be caused by overloaded CDN or device, network capacity, etc.
Mean opinion score (MOS) is an arithmetic combination of individual values/metrics that expresses a user’s QoE in one value. It’s very useful to define the relative importance of the metrics, and it helps us set priorities for improvements — as an improvement in one metric can cause degradation in others and vice versa. For example, we can decrease the buffering time and get a smoother stream by lowering the average bitrate. On the other hand, with the lower bitrate we also decrease the picture quality. So we need to clearly identify the higher priority, whether it is lower buffering or better picture quality.
In the edition about encoding parameters, we omitted to mention how you can evaluate the outputs produced by different encoder configurations. Popular mathematical approaches are Peak Signal-to-Noise Ratio (PSNR), or Structural Similarity Index Measure (SSIM). They evaluate picture quality from their exact perspective, and do not necessarily take into account the subjective perception of humans. Despite this, they are still heavily used as base evaluation methods, where anything under 35 PSNR score can be considered as ugly (e.g., with artifacts), while scores beyond 45 are considered to be unrecognizable from the original source (therefore wasting bits).
Subjective video quality can be predicted by Video Multimethod Assessment Fusion (VMAF), which takes into account the perception of quality by human eyes. A well-balanced score of quality and reasonable data consumption may impact the aforementioned QoE metrics and lead to their improvement. The trend is to keep bitrate low and quality high - which are in most cases two contradictory characteristics. More about using VMAF to find optimal bitrate ladder you can read in Jan Ozer’s posts 1 and 2.
Keep on reading…and learning
After finishing this series introducing the common terms from the media engineering world — the core of our business — it’s easy to see why providing a quality SVOD service requires the joint effort of several teams. It’s challenging, and we love it.
Continue your discovery path by reading another series of blog posts, this one about delivering the content the right way. Our great CDN series starts here: Selecting the right URL. In case you’ve already read the beginning, you can continue with the second or third part.