Hi Krunoslav, (dropped Rodrigo from the mail thread, as he seems no longer with AMD's display team, and stable kernel for the moment).
On Sun, Mar 22, 2026 at 7:11 PM Kovac, Krunoslav <[email protected]> wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > Hi Mario, > > I'm not on that mailing list, not sure how to reply, so I'll start by > replying directly here. > > Thanks for the reply. > There is a reason for the change that is alluded in the commit, but > perhaps I should've been clearer. > If you have an ARGB2101010 surface and monitor is 10bpc, one of the HW > design goals is that we can output this in a bit-perfect way, i.e., for > every surface pixel value K=0..1023, monitor will receive K at its end. > It's also one of the things some customers have checked for and complained > about historically. This is very hard to see visually or even with a > colorimeter but is readily apparent with a HW capture card. > > Our HW can accomplish this 10-bit perfect requirement if set up correctly, > however, it can only do so if we use rounding, not dithering. > For example, say you have a 10-bit code 200, our pipeline precision and > error accumulation may result with 200.15, which in 12bpc before dithering > would be 801 and the nature of the spatial dithering is that now and then > RNG result will push that to 201 output, it's just the way it works. > Rounding is several times less sensitive to this, and without this > randomness component, we can verify we're always accurate enough at 10bpc > level. > Ok, that makes a lot more sense. I can understand that. My own software and users have the same critical requirement for some use cases, of being able to pass through ARGB8888 surfaces to 8 bpc video sinks. They connect special neuroscience display equipment that parses special binary control information out of false-color coded framebuffer images, or implements very high color precision display up to 16 bpc per color channel on top of 8 bpc framebuffers and some shader magic. Luckily, using a standard 8 bpc framebuffer under the native X-Server, a 8 bpc DVI-D or DP video sink, and loading a specifically crafted gamma table achieved this for our case, despite the spatial dithering-down-to-8bpc being active. DC has some special detection function (__is_lut_linear() in amdgpu_dm_color.c) that detects if a user provided gamma lut is essentially meant to be a linear identity mapping lut, and if so, enables lut bypass or identity mapping iirc, and that does the trick well enough for us atm. At least under OpenGL + Xorg on modern DCN display engines. For older DCE engines, my software does its own low-level MMIO register programming to get rid of unwanted dithering, or to enforce the dithering it needs, but this has become impractical/infeasible for DCN. I haven't tested this yet under Wayland, as the Wayland eco system is not ready for the more demanding use cases, and we still have to cling to the native X-Server for possibly quite a while longer. Do you know what use cases those customers have for 10 bpc identity passthrough? I wonder if they are very similar to my use cases for 8 bpc identity passthrough. The rounding mode poses a problem for some of my users though, the research labs that can't afford or use highly specialized display equipment for the price of an upper class car, and possibly not even the price of "conventional" high end display monitors ("reference monitors", "broadcast monitors" as used in movie and tv/streaming post-production etc.) with true 12 bpc input and processing. Or situations where video bandwidth limits enforce an only 10 bpc output even for 12 bpc capable sinks. Those can be fine with 10 bpc + gpu dithering. > I believe we don't have surface info in that code, but one way to work > around it would be to use spatial dithering for FP16/ARGB16 and rounding > for 10 bits. But if we just switch to spatial, some of the earlier > complaints about 10-bit output having one-off bit errors will be coming > back. > Looking at all callers of resource_build_bit_depth_reduction_params(), they all have access to the associated "struct pipe_ctx", which should give access to pipe_ctx ->plane_state->format of an associated display plane. I could prepare a patch that passes the pipe_ctx from each caller into resource_build_bit_depth_reduction_params() and that function could check if a 16 bpc framebuffer is in use and switch to spatial dithering down-to-10-bpc in this case, and leave the rounding/truncation to 10 bpc otherwise. This workaround, that you also propose, would be the least bad of all bad solutions. One goal of the current patch was to be easy to backport, and also to still make it into drm-fixes before Linux 7. Linux 7.0 will be the standard distribution kernel for the upcoming Ubuntu 26.04-LTS, and therefore important for my users. I think what would really be needed in the long term is a drm connector property to control dithering. Some kms drivers had this in the olden days, many years ago. I don't think a guess-o-matic will always guess right, given that having > 10 bpc precision via dithering will be also beneficial on 8 or 10 bpc framebuffers + gamma tables for most use cases. Thanks, mario > Thanks, > Kruno > > -----Original Message----- > From: Mario Kleiner <[email protected]> > Sent: Saturday, March 21, 2026 1:21 AM > To: [email protected] > Cc: [email protected]; [email protected]; > [email protected]; Cyr, Aric <[email protected]>; Koo, Anthony < > [email protected]>; Rodrigo Siqueira <[email protected]>; Kovac, > Krunoslav <[email protected]>; Deucher, Alexander < > [email protected]> > Subject: [PATCH] drm/amd/display: Change dither policy for 10 bpc output > back to dithering > > [You don't often get email from [email protected]. Learn why > this is important at https://aka.ms/LearnAboutSenderIdentification ] > > Commit d5df648ec830 ("drm/amd/display: Change dither policy for 10bpc to > round") degraded display of 12 bpc color precision output to 10 bpc sinks > by switching 10 bpc output from dithering to "truncate to 10 bpc". > > I don't find the argumentation in that commit convincing, but the > consequences highly unfortunate, especially for applications that require > effective > 10 bpc precision output of > 10 bpc framebuffers. > > The argument wasn't something strong like "there are hardware design > defects or limitations which require us to work around broken dithering to > 10 bpc", or "there are some special use cases which do require truncation > to 10 bpc", but essentially "at some point in the past we used truncation > in Polaris/Vega times and it looks like it got inadvertently changed for > Navi, so let's do that again". I couldn't find evidence for that in the git > commit logs for this. The commit message also acknowledges that using > dithering "...makes some sense for FP16... > ...but not for ARGB2101010 surfaces..." > > The problem with this is that it makes fp16 surfaces, and especially > rgba16 fixed point surfaces, less useful. These are now well supported by > Mesa 25.3 and later via OpenGL + EGL, Vulkan/WSI, and by OSS AMDVLK > Vulkan/WSI/display, and also by GNOME 50 mutter under Wayland, and they > used to provide more than 10 bpc effective precision at the output. > > Even for 8 or 10 bpc surfaces, the color pipeline behind the framebuffer, > e.g., gamma tables, CTM, can be used for color correction and will benefit > from an effective > 10 bpc output precision via dithering, retaining some > precision that would get lost on the way through the pipeline, e.g., due to > non-linear gamma functions. > > Scientific apps rely on this for > 10 bpc display precision. Truncating to > 10 bpc, instead of dithering the pipeline internal 12 bpc precision down to > 10 bpc, causes a serious loss of precision. This also creates the > undesirable and slightly absurd situation that using a cheap monitor with > only 8 bpc input and display panel will yield roughly 12 bpc precision via > dithering from 12 -> 8 bpc, whereas investment into a more expensive > monitor with 10 bpc input and native 10 bpc display will only yield 10 bpc, > even if a fp16 or rgb16 framebuffer and/or a properly set up color pipeline > (gamma tables, CTM's etc. with more than 10 bpc out > precision) would allow effective 12 bpc precision output. > > Therefore this patch proposes reverting that commit and going back to > dithering down to 10 bpc, consistent with the behaviour for 6 bpc or 8 bpc > output. > > Successfully tested on AMD Polaris DCE 11.2 and Raven Ridge DCN 1.0 with a > native 10 bpc capable monitor, outputting a RGBA16 unorm framebuffer and > measuring resulting color precision with a photometer. No apparent visual > artifacts or problems were observed, and effective precision was measured > to be 12 bpc again, as expected. > > Fixes: d5df648ec830 ("drm/amd/display: Change dither policy for 10bpc to > round") > Signed-off-by: Mario Kleiner <[email protected]> > Tested-by: Mario Kleiner <[email protected]> > Cc: [email protected] > Cc: Aric Cyr <[email protected]> > Cc: Anthony Koo <[email protected]> > Cc: Rodrigo Siqueira <[email protected]> > Cc: Krunoslav Kovac <[email protected]> > Cc: Alex Deucher <[email protected]> > --- > drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c > b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c > index c9fbb64d706a..29db5404c4a0 100644 > --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c > +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c > @@ -5056,7 +5056,7 @@ void > resource_build_bit_depth_reduction_params(struct dc_stream_state *stream, > option = DITHER_OPTION_SPATIAL8; > break; > case COLOR_DEPTH_101010: > - option = DITHER_OPTION_TRUN10; > + option = DITHER_OPTION_SPATIAL10; > break; > default: > option = DITHER_OPTION_DISABLE; > -- > 2.43.0 > >
