On Mon, May 4, 2026 at 4:29 PM Mikhail Gavrilov <[email protected]> wrote: > > dcn32_validate_bandwidth() wraps dcn32_internal_validate_bw() with > DC_FP_START()/DC_FP_END(). On x86 non-RT, DC_FP_START expands into > kernel_fpu_begin() which takes fpregs_lock(), i.e. local_bh_disable(). > Allocations done inside this region must therefore not sleep. > > The legacy DML1 path through dcn32_full_validate_bw_helper() -> > dcn32_add_phantom_pipes() -> dcn32_enable_phantom_plane() unconditionally > calls dc_state_create_phantom_plane() -> dc_create_plane_state(), which > performs kvzalloc(sizeof(struct dc_plane_state)). On a recent kernel > sizeof(struct dc_plane_state) is 343736 bytes (335 KiB), well above the > PAGE_ALLOC_COSTLY_ORDER threshold, so __kvmalloc_node() takes the vmalloc > path. __get_vm_area_node() then trips its BUG_ON(in_interrupt()) because > SOFTIRQ_DISABLE_OFFSET is set in preempt_count: > > kernel BUG at mm/vmalloc.c:3206! > RIP: __get_vm_area_node+0x257/0x2d0 > Workqueue: events_unbound commit_work > Call Trace: > __vmalloc_node_range_noprof+0x22b/0x570 > __kvmalloc_node_noprof+0x3d0/0xb40 > dc_create_plane_state+0x35/0x290 [amdgpu] > dc_state_create_phantom_plane+0x1a/0x120 [amdgpu] > dcn32_enable_phantom_plane+0x101/0x780 [amdgpu] > dcn32_add_phantom_pipes+0x47/0x460 [amdgpu] > dcn32_full_validate_bw_helper.constprop.0+0xa46/0x1d70 [amdgpu] > dcn32_internal_validate_bw+0x49c/0x1600 [amdgpu] > dml1_validate+0x20f/0x800 [amdgpu] > dcn32_validate_bandwidth+0x317/0x540 [amdgpu] > dc_validate_with_context+0xd34/0x1d30 [amdgpu] > dc_commit_streams+0x7ca/0x1810 [amdgpu] > amdgpu_dm_commit_streams+0xfd4/0x1e60 [amdgpu] > amdgpu_dm_atomic_commit_tail+0x29e/0x3520 [amdgpu] > commit_tail+0x204/0x4b0 > process_one_work+0x8fd/0x16a0 > > Per-CPU __preempt_count on the crashing CPU at panic time was 0x202: > SOFTIRQ_DISABLE_OFFSET (0x200) from fpregs_lock() plus two preempt holds > from dc_fpu_begin() and kernel_fpu_begin(). > > The DML2 paths already wrap their large vzalloc()s in > DC_RUN_WITH_PREEMPTION_ENABLED() to handle this case (see > drivers/gpu/drm/amd/display/dc/dml2_0/dml21/dml21_wrapper.c:26 and > drivers/gpu/drm/amd/display/dc/dml2_0/dml2_wrapper.c:24). Apply the same > guard to the DML1 phantom-plane allocation in dcn32_enable_phantom_plane(). > > This is a separate class of issue from "drm/amd/display: Fix unsafe uses > of kernel mode FPU" by Ard Biesheuvel, which addressed callers entering > DC FP compilation units without DC_FP_START. The bug fixed here is the > inverse: a sleeping allocator invoked from within an active DC_FP_START > region. > > Reproducer (RX 7900 XTX, single 4K HDMI display, DCN 3.2): launch any > workload that produces rapid atomic modeset commits. The most reliable > trigger observed is launching Rise of the Tomb Raider via Proton and > repeatedly pressing the Super key during the level loading screen; > crash occurs within ~4 minutes uptime. Random crashes are also observed > during routine fullscreen toggles (image viewers, chat applications). > > Hardware verified clean: memtest86+ 4 passes, stressapptest -W -m 32 > 4 hours, both pass with 0 errors. KASAN active, no reports under load. > > Fixes: 235c67634230 ("drm/amd/display: add DCN32/321 specific files for > Display Core") > Cc: [email protected] # v6.0+ > Signed-off-by: Mikhail Gavrilov <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4470 Alex > --- > .../drm/amd/display/dc/resource/dcn32/dcn32_resource.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c > b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c > index 82f81b586986..3751f7a94a05 100644 > --- a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c > +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c > @@ -92,9 +92,14 @@ > #include "dml/dcn32/dcn32_fpu.h" > > #include "dc_state_priv.h" > +#include "dc_fpu.h" > > #include "dml2_0/dml2_wrapper.h" > > +#if !defined(DC_RUN_WITH_PREEMPTION_ENABLED) > +#define DC_RUN_WITH_PREEMPTION_ENABLED(code) code > +#endif > + > #define DC_LOGGER_INIT(logger) > > enum dcn32_clk_src_array_id { > @@ -1684,7 +1689,8 @@ static void dcn32_enable_phantom_plane(struct dc *dc, > if (curr_pipe->top_pipe && curr_pipe->top_pipe->plane_state > == curr_pipe->plane_state) > phantom_plane = prev_phantom_plane; > else > - phantom_plane = dc_state_create_phantom_plane(dc, > context, curr_pipe->plane_state); > + DC_RUN_WITH_PREEMPTION_ENABLED(phantom_plane = > + dc_state_create_phantom_plane(dc, context, > curr_pipe->plane_state)); > > if (!phantom_plane) > continue; > -- > 2.54.0 >
