On 5/4/26 4:19 PM, Mikhail Gavrilov wrote:
dcn32_validate_bandwidth() wraps dcn32_internal_validate_bw() with
DC_FP_START()/DC_FP_END(). On x86 non-RT, DC_FP_START expands into
kernel_fpu_begin() which takes fpregs_lock(), i.e. local_bh_disable().
Allocations done inside this region must therefore not sleep.

The legacy DML1 path through dcn32_full_validate_bw_helper() ->
dcn32_add_phantom_pipes() -> dcn32_enable_phantom_plane() unconditionally
calls dc_state_create_phantom_plane() -> dc_create_plane_state(), which
performs kvzalloc(sizeof(struct dc_plane_state)). On a recent kernel
sizeof(struct dc_plane_state) is 343736 bytes (335 KiB), well above the
PAGE_ALLOC_COSTLY_ORDER threshold, so __kvmalloc_node() takes the vmalloc
path. __get_vm_area_node() then trips its BUG_ON(in_interrupt()) because
SOFTIRQ_DISABLE_OFFSET is set in preempt_count:

   kernel BUG at mm/vmalloc.c:3206!
   RIP: __get_vm_area_node+0x257/0x2d0
   Workqueue: events_unbound commit_work
   Call Trace:
    __vmalloc_node_range_noprof+0x22b/0x570
    __kvmalloc_node_noprof+0x3d0/0xb40
    dc_create_plane_state+0x35/0x290 [amdgpu]
    dc_state_create_phantom_plane+0x1a/0x120 [amdgpu]
    dcn32_enable_phantom_plane+0x101/0x780 [amdgpu]
    dcn32_add_phantom_pipes+0x47/0x460 [amdgpu]
    dcn32_full_validate_bw_helper.constprop.0+0xa46/0x1d70 [amdgpu]
    dcn32_internal_validate_bw+0x49c/0x1600 [amdgpu]
    dml1_validate+0x20f/0x800 [amdgpu]
    dcn32_validate_bandwidth+0x317/0x540 [amdgpu]
    dc_validate_with_context+0xd34/0x1d30 [amdgpu]
    dc_commit_streams+0x7ca/0x1810 [amdgpu]
    amdgpu_dm_commit_streams+0xfd4/0x1e60 [amdgpu]
    amdgpu_dm_atomic_commit_tail+0x29e/0x3520 [amdgpu]
    commit_tail+0x204/0x4b0
    process_one_work+0x8fd/0x16a0

Per-CPU __preempt_count on the crashing CPU at panic time was 0x202:
SOFTIRQ_DISABLE_OFFSET (0x200) from fpregs_lock() plus two preempt holds
from dc_fpu_begin() and kernel_fpu_begin().

The DML2 paths already wrap their large vzalloc()s in
DC_RUN_WITH_PREEMPTION_ENABLED() to handle this case (see
drivers/gpu/drm/amd/display/dc/dml2_0/dml21/dml21_wrapper.c:26 and
drivers/gpu/drm/amd/display/dc/dml2_0/dml2_wrapper.c:24). Apply the same
guard to the DML1 phantom-plane allocation in dcn32_enable_phantom_plane().

This is a separate class of issue from "drm/amd/display: Fix unsafe uses
of kernel mode FPU" by Ard Biesheuvel, which addressed callers entering
DC FP compilation units without DC_FP_START. The bug fixed here is the
inverse: a sleeping allocator invoked from within an active DC_FP_START
region.

Reproducer (RX 7900 XTX, single 4K HDMI display, DCN 3.2): launch any
workload that produces rapid atomic modeset commits. The most reliable
trigger observed is launching Rise of the Tomb Raider via Proton and
repeatedly pressing the Super key during the level loading screen;
crash occurs within ~4 minutes uptime. Random crashes are also observed
during routine fullscreen toggles (image viewers, chat applications).

Hardware verified clean: memtest86+ 4 passes, stressapptest -W -m 32
4 hours, both pass with 0 errors. KASAN active, no reports under load.

Fixes: 235c67634230 ("drm/amd/display: add DCN32/321 specific files for Display 
Core")
Cc: [email protected] # v6.0+
Signed-off-by: Mikhail Gavrilov <[email protected]>
---
  .../drm/amd/display/dc/resource/dcn32/dcn32_resource.c    | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c 
b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
index 82f81b586986..3751f7a94a05 100644
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
@@ -92,9 +92,14 @@
  #include "dml/dcn32/dcn32_fpu.h"
#include "dc_state_priv.h"
+#include "dc_fpu.h"
#include "dml2_0/dml2_wrapper.h" +#if !defined(DC_RUN_WITH_PREEMPTION_ENABLED)
+#define DC_RUN_WITH_PREEMPTION_ENABLED(code) code
+#endif
+
  #define DC_LOGGER_INIT(logger)
enum dcn32_clk_src_array_id {
@@ -1684,7 +1689,8 @@ static void dcn32_enable_phantom_plane(struct dc *dc,
                if (curr_pipe->top_pipe && curr_pipe->top_pipe->plane_state == 
curr_pipe->plane_state)
                        phantom_plane = prev_phantom_plane;
                else
-                       phantom_plane = dc_state_create_phantom_plane(dc, context, 
curr_pipe->plane_state);
+                       DC_RUN_WITH_PREEMPTION_ENABLED(phantom_plane =
+                               dc_state_create_phantom_plane(dc, context, 
curr_pipe->plane_state));
if (!phantom_plane)
                        continue;

Thank you very much! I'll add this to our weekly testing before merging.

Reply via email to