Applied. Thanks! Alex
On Tue, Mar 24, 2026 at 11:00 PM Kuehling, Felix <[email protected]> wrote: > > > On 2026-03-23 00:28, Donet Tom wrote: > > The control stack size is calculated based on the number of CUs and > > waves, and is then aligned to PAGE_SIZE. When the resulting control > > stack size is aligned to 64 KB, GPU hangs and queue preemption > > failures are observed while running RCCL unit tests on systems with > > more than two GPUs. > > > > amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with > > doorbell_id: 80030008 > > amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues > > amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4 > > amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with > > doorbell_id: 80030008 > > amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues > > amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues > > > > This issue is observed on both 4 KB and 64 KB system page-size > > configurations. > > > > This patch fixes the issue by aligning the control stack size to > > AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size > > will not be 64 KB on systems with a 64 KB page size and queue > > preemption works correctly. > > > > Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE, > > which can waste memory if the system page size is large. In this patch, > > wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated > > from wg_data_size and the control stack size, is aligned to PAGE_SIZE. > > > > Signed-off-by: Donet Tom <[email protected]> > > Reviewed-by: Felix Kuehling <[email protected]> > > > > --- > > drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c > > b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c > > index 572b21e39e83..9d4838461168 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c > > @@ -492,10 +492,11 @@ void kfd_queue_ctx_save_restore_size(struct > > kfd_topology_device *dev) > > cu_num = props->simd_count / props->simd_per_cu / > > NUM_XCC(dev->gpu->xcc_mask); > > wave_num = get_num_waves(props, gfxv, cu_num); > > > > - wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv, > > props), PAGE_SIZE); > > + wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv, > > props), > > + AMDGPU_GPU_PAGE_SIZE); > > ctl_stack_size = wave_num * CNTL_STACK_BYTES_PER_WAVE(gfxv) + 8; > > ctl_stack_size = ALIGN(SIZEOF_HSA_USER_CONTEXT_SAVE_AREA_HEADER + > > ctl_stack_size, > > - PAGE_SIZE); > > + AMDGPU_GPU_PAGE_SIZE); > > > > if ((gfxv / 10000 * 10000) == 100000) { > > /* HW design limits control stack size to 0x7000. > > @@ -507,7 +508,7 @@ void kfd_queue_ctx_save_restore_size(struct > > kfd_topology_device *dev) > > > > props->ctl_stack_size = ctl_stack_size; > > props->debug_memory_size = ALIGN(wave_num * DEBUGGER_BYTES_PER_WAVE, > > DEBUGGER_BYTES_ALIGN); > > - props->cwsr_size = ctl_stack_size + wg_data_size; > > + props->cwsr_size = ALIGN(ctl_stack_size + wg_data_size, PAGE_SIZE); > > > > if (gfxv == 80002) /* GFX_VERSION_TONGA */ > > props->eop_buffer_size = 0x8000;
