In part 1, I covered the various ways of rendering and displaying EDR images in SwiftUI. Now in part 2, I go deeper into color spaces, transfer functions, and rendering EDR content in Metal.
Back in part 1, I mentioned attempting to render RAW images in Metal since I couldn’t figure out how to get EDR working until I revisited it while writing this blog post. As I already use Metal to render Composure’s viewfinder/video preview, I started by attempting to render this SDR content in EDR, and quickly realized that this could be a useful way of boosting brightness if the user was in a very bright environment.
The Metal RAW image viewer was shelved due to performance issues and replaced with the native SwiftUI implementation, and the EDR live preview ended up being almost useless. Despite this, I finished the EDR live preview anyway - mostly because I had also spent a lot of time on this. and learnt a lot about color spaces, transfer functions and Metal EDR support in the process.
Color Spaces and Transfer Functions
Very roughly, a color space describes a way of mapping some input vector - often RGB values, into a color that can be seen by human eyeballs.
sRGB is the most common color space you will come across, but recent Apple displays and cameras now use Display P3, which supports a wider color gamut. BT.2020 is wider still (BT.2100 uses the same color space but supports HDR).
If you want to learn more, David Gavian has a good blog post on Display P3 and conversions to and from sRGB.
Transfer Functions
A transfer function, also known as an electro-optical transfer function (EOTF), is a function which maps an input signal into light intensity.
Like with hearing, humans perceive brightness non-linearly: increasing the brightness by some fixed value will be more noticeable in dark areas compared to light areas.
Hence, if you have a limited bitdepth, ‘spending’ more on that budget on darker areas will reduce the amount of banding perceived. If you want to learn more about this, this 10 part series covers the history of gamma, its issues and alternatives.
The sRGB gamma transfer function, roughly γ=2.2
,
raises values to the power of 2.2 to convert pixel values to brightness.
DisplayP3 applies the same EOTF function as sRGB, while BT.2020 uses the same
as BT.709 (the standard for HD TV).
For displaying HDR content in Metal (with BT.2100 or Display P3), there are two main contenders: hybrid log-gamma and perceptual quantization.
Hybrid-log gamma, as the name suggests, uses both a gamma and a log function.
Assuming a range and domain of [0, 1]
(as is done under BT.2100):
- Inputs in the range
[0.0, 0.5]
use a gamma function and map to a brightness of[0, 1/12]
, where1/12
is SDR white. - Inputs in the range
[0.5, 1.0]
use a log function and map to[1/12, 1]
.
Hence, HLG is limited to a maximum brightness of 12 times SDR white. The iPhone 15 Pro has a maximum brightness of 8 times SDR, so at least for now, HLG is sufficient to cover the capabilities of iPhone displays.
HLG is backwards-compatible with BT.709 (which is similar but not equivalent to sRGB in ways I don’t understand): simply ignore the top half of the range, clipping it to white.
If you encode SDR content with HLG, you effectively lose one bit of bitdepth. However, you will at minimum have 10-bits, so SDR content will cover 9-bits; still more than the 8-bits typically used for SDR content.
The perceptual quantizer is used by HDR10 and has a massive maximum luminance of 10,000 nits. In comparison, monitors are in the range of 300–500 nits, and the iPhone 15 Pro only goes up to a maximum of 2,000 nits. The transfer function is supposed to map directly to output luminance, but on iOS, the values are actually relative to the current display brightness. 100 nits is assumed to be SDR white, giving PQ a maximum headroom of 100.
HDR in Metal
Headroom
If you want to support HDR rendering, you should first determine if the display even supports EDR.
This is pretty simple: check if the screen’s potential EDR headroom is larger than 1
:
self.view.window?.screen.potentialEDRHeadroom
Headroom is the multiple beyond SDR white that the display can output. For
example, a value of 2
means that, under optimum conditions, the content can be
rendered twice as bright as SDR white.
If there is headroom, you can enable EDR by setting
CAMetalLayer.wantsExtendedDynamicRangeContent
to true.
Once set, UIScreen.currentEDRHeadroom
will ramp up over a few seconds, but
there is no guarantee that it will reach the potential headroom. This varies,
most notably depending on the current (SDR) brightness:
when at maximum brightness, the headroom is likely to be very small.
Additionally, even the same headroom is much less noticeable at higher brightness
to our eyes.
The following table shows the maximum potential headroom, and then the headrooms observed when the device was at maximum brightness:
Device | Potential headroom | Min headroom |
---|---|---|
iPhone Xs | 4 | ~1.2 |
iPhone 11 | 2 | ~1 |
iPhone 15 Pro | 8 | ~2 |
This of course, means that using EDR to boost the maximum brightness barely work. Unfortunately, I didn’t realize this until pretty late into development as I was developing and testing inside where I never needed to use maximum brightness.
Color Space and Pixel Format
According to this WWDC 2022 session, EDR rendering in Metal is only supported by two pixel formats and a few color spaces:
.rgba16Float (64-bit) | .bgra10a2Unorm (32-bit) |
---|---|
.extendedLinearSRGB | .itur_709_PQ |
.extendedLinearDisplayP3 | .displayP3_PQ |
.extendedLinearITUR_2020 | .itur_2100_PQ |
.displayP3_HLG | |
.itur_2100_HLG |
.rgba16Float
.rgba16Float
uses half
s for each component, requiring 64-bits per pixel,
doubling the storage requirements compared to a standard 8-bit texture. This is
the pixel format used by iOS (and macOS)
internally
when EDR is enabled.
The extendedLinear*
color spaces map
[0, 1]
to SDR content
while anything beyond that is in HDR land. As such, normalized integer pixel
formats can’t support these linear color spaces.
Quantization error is not an issue: one useful property of floating point formats is that accuracy is high for small values and decreases with magnitude, mirroring the behavior of human visual perception.
While double the size of .bgra10a2Unorm
, working in linear space is very easy.
For example, to map SDR content in [0, 1]
to HDR, you can simply multiply the
content by the currentEDRHeadroom
value. Just remember to linearize the input
first if necessary.
.bgra10a2Unorm
bgra10a2Unorm
stores each component as 10-bit integers but are interpreted as
normalized values - as a linear mapping to [0, 1]
.
It only needs 32-bits per pixel - the same as a standard 8-bit texture, although
you only have 2 bits of alpha. It also seems like .bgr10_xr
can be also be
used, although this format has no alpha channel.
If the display’s headroom is less than that of the content, highlights can get clipped, which can be very ugly and distracting. Tone-mapping is used to reduce the dynamic range of the content to match that of the display in a natural-looking way.
iOS can do this automatically, albeit at the cost of additional processing,
by setting the CAMetalLayer.edrMetadata
property. For HLG, there an easy .hlg
option, but for PQ, we need to use .hdr10()
options, which require a bunch of
parameters which I couldn’t figure out how to set.
Hybrid-Log Gamma
HLG was what I ended up using as I struggled too much with PQ initially. I
used displayP3_HLG
rather than itur_2100_HLG
as the iOS camera uses the
P3 color space,
removing the need for an additional color space transform.
In the shader, the process is as follows:
- I receive an image in the DisplayP3 color space
- I apply the sRGB/Display P3 gamma transform (EOTF), linearizing the image
- I apply the HLG OETF transform
Then, at some point along the chain, the HLG EOTF transform will be applied, re-linearizing the image for display.
While I initially used the .hlg
tone-mapping and simply mapped the SDR content
into the full HLG range, I switched to using manual tone-mapping so that I could
smoothly transition down from EDR to SDR with no noticeable jump (SDR to EDR
is done ‘automatically’ as the headroom ramp-up takes time).
I finally ended up with the following shader code.
// Constants for hybrid-log-gamma transform, as defined by Rec. 2100
// See: https://en.wikipedia.org/wiki/Hybrid_log%E2%80%93gamma
#define hlgA (0.17883277)
#define hlgB (1 - 4 * hlgA)
#define hlgC (0.5 - hlgA * log(4 * hlgA))
/// Hybrid-log-gamma transform, as defined by Rec. 2100
float hlgOETF(float O) {
// Rec. 2100 HLG curve.
// Points (0, 0), (0,5, 1/12) (reference white), (1, 1) (12x reference white)
if (O < 1.0 / 12.0) {
return sqrt(3 * O);
} else {
return hlgA * log(12.0 * O - hlgB) + hlgC;
}
}
/// Inverse hybrid-log gamma transform, as defined by Rec. 2100
float hlgEOTF(float E) {
if (E < 0.5) {
return pow(E, 2) / 3.0;
} else {
return (exp((E - hlgC) / hlgA) + hlgB) / 12.0;
}
}
/// sRGB gamma function
float srgbEOTF(float E) {
// https://www.color.org/srgb.pdf
if (E <= 0.04045) {
return E / 12.92;
} else {
return pow((E + 0.055) / 1.055, 2.4);
}
}
float sdrToEdr(float sample, float edrHeadroom) {
// Linearize the source sample
float linearized = srgbEOTF(sample);
// If headroom=1, map white to `0.5`
// If headroom=12, map white to `1.0`
float maxOut = 0.5 + 0.5 * ((edrHeadroom - 1.0) / (12.0 - 1.0));
// Use the EOTF to transform output pixel values into linear brightness.
// This can be moved onto the CPU - calculate once per frame
// intead of repeating for every fragment
float maxValue = hlgEOTF(maxOut);
// Apply the inverse EOTF; the OETF
// Multiply by maxValue such that `1` will map to `edrHeadroom` after being
// re-linearized
return hlgOETF(linearized * maxValue);
}
Perceptual Quantizer
PQ maps input values directly to output luminance, but I until writing this post, I couldn’t figure out a way of determining the PQ values for SDR white or the EDR headroom.
This WWDC 2021 session on HDR rendering, states that PQ content can rendered on a floating-point EDR texture by applying the EOTF to linearize the content and then dividing by 100 nits to map reference white to 1.0.
Later on, they describe the process of determining the brightest pixel value:
var linearMaxComponents: [CGFloat] = [edrHeadroom, edrHeadroom, edrHeadroom, 1.0]
guard let linearMaxColor = CGColor(
colorSpace: .init(name: CGColorSpace.extendedLinearDisplayP3)!,
components: &linearMaxComponents
), let pqMaxColor = linearMaxColor.converted(
to: .init(name: CGColorSpace.displayP3_PQ), // or itur_2100_PQ
intent: .defaultIntent,
options: nil
), let pqMax = pqMaxColor.components?.first else {
return
}
Applying the PQ EOTF function to this value confirms that iOS simply treats a transformed PQ-encoded value of 100 (nits) as SDR white, regardless of the actual screen brightness.
I wondered how PQ could output luminance directly while still allowing the user to adjust the brightness of their content, and the answer seems to be that it doesn’t; Apple simply treats PQ output as being relative to SDR white.
Regardless, this makes mapping SDR content to EDR with PQ easy: multiply the
linear value by 100 times the headroom, then apply the OETF. If you instead
use system tone mapping, opticalOutputScale
of the CAEDRMetadata.hdr10
constructors should be
set to 100,
although I’m unsure what the other properties should be.
Conclusion
While this didn’t end up being a useful feature, it was fun to dive deep into the world of color science and learn a little about the complexity of color management, representation and encoding.
If you’ve enjoyed reading this, go ahead and download Composure Camera for free and be underwhelmed when you see the EDR badge appear.