Skip to content

Adventures in EDR, Part 2: Metal

Published:

In part 1, I covered the various ways of rendering and displaying EDR images in SwiftUI. Now in part 2, I go deeper into color spaces, transfer functions, and rendering EDR content in Metal.

Back in part 1, I mentioned attempting to render RAW images in Metal since I couldn’t figure out how to get EDR working until I revisited it while writing this blog post. As I already use Metal to render Composure’s viewfinder/video preview, I started by attempting to render this SDR content in EDR, and quickly realized that this could be a useful way of boosting brightness if the user was in a very bright environment.

The Metal RAW image viewer was shelved due to performance issues and replaced with the native SwiftUI implementation, and the EDR live preview ended up being almost useless. Despite this, I finished the EDR live preview anyway - mostly because I had also spent a lot of time on this. and learnt a lot about color spaces, transfer functions and Metal EDR support in the process.

Color Spaces and Transfer Functions

Very roughly, a color space describes a way of mapping some input vector - often RGB values, into a color that can be seen by human eyeballs.

sRGB is the most common color space you will come across, but recent Apple displays and cameras now use Display P3, which supports a wider color gamut. BT.2020 is wider still (BT.2100 uses the same color space but supports HDR).

A comparison of RGB gamuts of sRGB, P3, Rec2020, etc. using the CIE1931
chromaticity
diagram.

Visualization by Myndex, CC BY-SA 4.0 DEED. Modified to partially support dark mode.

If you want to learn more, David Gavian has a good blog post on Display P3 and conversions to and from sRGB.

Transfer Functions

A transfer function, also known as an electro-optical transfer function (EOTF), is a function which maps an input signal into light intensity.

Like with hearing, humans perceive brightness non-linearly: increasing the brightness by some fixed value will be more noticeable in dark areas compared to light areas.

Comparison between linear and gamma transfer functions

Hence, if you have a limited bitdepth, ‘spending’ more on that budget on darker areas will reduce the amount of banding perceived. If you want to learn more about this, this 10 part series covers the history of gamma, its issues and alternatives.

The sRGB gamma transfer function, roughly γ=2.2, raises values to the power of 2.2 to convert pixel values to brightness. DisplayP3 applies the same EOTF function as sRGB, while BT.2020 uses the same as BT.709 (the standard for HD TV).

For displaying HDR content in Metal (with BT.2100 or Display P3), there are two main contenders: hybrid log-gamma and perceptual quantization.

Hybrid-log gamma, as the name suggests, uses both a gamma and a log function. Assuming a range and domain of [0, 1] (as is done under BT.2100):

Hence, HLG is limited to a maximum brightness of 12 times SDR white. The iPhone 15 Pro has a maximum brightness of 8 times SDR, so at least for now, HLG is sufficient to cover the capabilities of iPhone displays.

HLG is backwards-compatible with BT.709 (which is similar but not equivalent to sRGB in ways I don’t understand): simply ignore the top half of the range, clipping it to white.

If you encode SDR content with HLG, you effectively lose one bit of bitdepth. However, you will at minimum have 10-bits, so SDR content will cover 9-bits; still more than the 8-bits typically used for SDR content.

The perceptual quantizer is used by HDR10 and has a massive maximum luminance of 10,000 nits. In comparison, monitors are in the range of 300–500 nits, and the iPhone 15 Pro only goes up to a maximum of 2,000 nits. The transfer function is supposed to map directly to output luminance, but on iOS, the values are actually relative to the current display brightness. 100 nits is assumed to be SDR white, giving PQ a maximum headroom of 100.

HDR in Metal

Headroom

If you want to support HDR rendering, you should first determine if the display even supports EDR.

This is pretty simple: check if the screen’s potential EDR headroom is larger than 1:

self.view.window?.screen.potentialEDRHeadroom

Headroom is the multiple beyond SDR white that the display can output. For example, a value of 2 means that, under optimum conditions, the content can be rendered twice as bright as SDR white.

If there is headroom, you can enable EDR by setting CAMetalLayer.wantsExtendedDynamicRangeContent to true.

Once set, UIScreen.currentEDRHeadroom will ramp up over a few seconds, but there is no guarantee that it will reach the potential headroom. This varies, most notably depending on the current (SDR) brightness: when at maximum brightness, the headroom is likely to be very small. Additionally, even the same headroom is much less noticeable at higher brightness to our eyes.

The following table shows the maximum potential headroom, and then the headrooms observed when the device was at maximum brightness:

DevicePotential headroomMin headroom
iPhone Xs4~1.2
iPhone 112~1
iPhone 15 Pro8~2

This of course, means that using EDR to boost the maximum brightness barely work. Unfortunately, I didn’t realize this until pretty late into development as I was developing and testing inside where I never needed to use maximum brightness.

Color Space and Pixel Format

According to this WWDC 2022 session, EDR rendering in Metal is only supported by two pixel formats and a few color spaces:

.rgba16Float (64-bit).bgra10a2Unorm (32-bit)
.extendedLinearSRGB.itur_709_PQ
.extendedLinearDisplayP3.displayP3_PQ
.extendedLinearITUR_2020.itur_2100_PQ
.displayP3_HLG
.itur_2100_HLG

.rgba16Float

.rgba16Float uses halfs for each component, requiring 64-bits per pixel, doubling the storage requirements compared to a standard 8-bit texture. This is the pixel format used by iOS (and macOS) internally when EDR is enabled.

The extendedLinear* color spaces map [0, 1] to SDR content while anything beyond that is in HDR land. As such, normalized integer pixel formats can’t support these linear color spaces.

Quantization error is not an issue: one useful property of floating point formats is that accuracy is high for small values and decreases with magnitude, mirroring the behavior of human visual perception.

While double the size of .bgra10a2Unorm, working in linear space is very easy. For example, to map SDR content in [0, 1] to HDR, you can simply multiply the content by the currentEDRHeadroom value. Just remember to linearize the input first if necessary.

.bgra10a2Unorm

bgra10a2Unorm stores each component as 10-bit integers but are interpreted as normalized values - as a linear mapping to [0, 1].

It only needs 32-bits per pixel - the same as a standard 8-bit texture, although you only have 2 bits of alpha. It also seems like .bgr10_xr can be also be used, although this format has no alpha channel.

If the display’s headroom is less than that of the content, highlights can get clipped, which can be very ugly and distracting. Tone-mapping is used to reduce the dynamic range of the content to match that of the display in a natural-looking way.

iOS can do this automatically, albeit at the cost of additional processing, by setting the CAMetalLayer.edrMetadata property. For HLG, there an easy .hlg option, but for PQ, we need to use .hdr10() options, which require a bunch of parameters which I couldn’t figure out how to set.

Hybrid-Log Gamma

HLG was what I ended up using as I struggled too much with PQ initially. I used displayP3_HLG rather than itur_2100_HLG as the iOS camera uses the P3 color space, removing the need for an additional color space transform.

In the shader, the process is as follows:

Then, at some point along the chain, the HLG EOTF transform will be applied, re-linearizing the image for display.

While I initially used the .hlg tone-mapping and simply mapped the SDR content into the full HLG range, I switched to using manual tone-mapping so that I could smoothly transition down from EDR to SDR with no noticeable jump (SDR to EDR is done ‘automatically’ as the headroom ramp-up takes time).

I finally ended up with the following shader code.

// Constants for hybrid-log-gamma transform, as defined by Rec. 2100
// See: https://en.wikipedia.org/wiki/Hybrid_log%E2%80%93gamma
#define hlgA (0.17883277)
#define hlgB (1 - 4 * hlgA)
#define hlgC (0.5 - hlgA * log(4 * hlgA))

/// Hybrid-log-gamma transform, as defined by Rec. 2100
float hlgOETF(float O) {
  // Rec. 2100 HLG curve.
  // Points (0, 0), (0,5, 1/12) (reference white), (1, 1) (12x reference white)

  if (O < 1.0 / 12.0) {
    return sqrt(3 * O);
  } else {
    return hlgA * log(12.0 * O - hlgB) + hlgC;
  }
}

/// Inverse hybrid-log gamma transform, as defined by Rec. 2100
float hlgEOTF(float E) {
  if (E < 0.5) {
    return pow(E, 2) / 3.0;
  } else {
    return (exp((E - hlgC) / hlgA) + hlgB) / 12.0;
  }
}

/// sRGB gamma function
float srgbEOTF(float E) {
  // https://www.color.org/srgb.pdf
  if (E <= 0.04045) {
    return E / 12.92;
  } else {
    return pow((E + 0.055) / 1.055, 2.4);
  }
}

float sdrToEdr(float sample, float edrHeadroom) {
  // Linearize the source sample
  float linearized = srgbEOTF(sample);

  // If headroom=1, map white to `0.5`
  // If headroom=12, map white to `1.0`
  float maxOut = 0.5 + 0.5 * ((edrHeadroom - 1.0) / (12.0 - 1.0));

  // Use the EOTF to transform output pixel values into linear brightness.
  // This can be moved onto the CPU - calculate once per frame
  // intead of repeating for every fragment
  float maxValue = hlgEOTF(maxOut);

  // Apply the inverse EOTF; the OETF
  // Multiply by maxValue such that `1` will map to `edrHeadroom` after being
  // re-linearized
  return hlgOETF(linearized * maxValue);
}

Perceptual Quantizer

PQ maps input values directly to output luminance, but I until writing this post, I couldn’t figure out a way of determining the PQ values for SDR white or the EDR headroom.

This WWDC 2021 session on HDR rendering, states that PQ content can rendered on a floating-point EDR texture by applying the EOTF to linearize the content and then dividing by 100 nits to map reference white to 1.0.

Later on, they describe the process of determining the brightest pixel value:

var linearMaxComponents: [CGFloat] = [edrHeadroom, edrHeadroom, edrHeadroom, 1.0]
guard let linearMaxColor = CGColor(
    colorSpace: .init(name: CGColorSpace.extendedLinearDisplayP3)!,
    components: &linearMaxComponents
), let pqMaxColor = linearMaxColor.converted(
    to: .init(name: CGColorSpace.displayP3_PQ), // or itur_2100_PQ
    intent: .defaultIntent,
    options: nil
), let pqMax = pqMaxColor.components?.first else {
    return
}

Applying the PQ EOTF function to this value confirms that iOS simply treats a transformed PQ-encoded value of 100 (nits) as SDR white, regardless of the actual screen brightness.

I wondered how PQ could output luminance directly while still allowing the user to adjust the brightness of their content, and the answer seems to be that it doesn’t; Apple simply treats PQ output as being relative to SDR white.

Regardless, this makes mapping SDR content to EDR with PQ easy: multiply the linear value by 100 times the headroom, then apply the OETF. If you instead use system tone mapping, opticalOutputScale of the CAEDRMetadata.hdr10 constructors should be set to 100, although I’m unsure what the other properties should be.

Conclusion

While this didn’t end up being a useful feature, it was fun to dive deep into the world of color science and learn a little about the complexity of color management, representation and encoding.

Video showing EDR in action. On the left is a phone at low brightness; the image gets significantly brighter and the UI begins to look grey in comparison. On the right is a phone at maximum brightness; the image gets a tiny bit brighter. Note that EDR is fully automatic on release builds, enabling itself only when at max brightness.

If you’ve enjoyed reading this, go ahead and download Composure Camera for free and be underwhelmed when you see the EDR badge appear.