On Tue, 2025-09-09 at 17:21 +0200, Christian König wrote: > On 09.09.25 16:49, Timur Kristóf wrote: > > SDMA v3-v5 can copy almost 4 MiB in a single copy operation. > > Use the same value as PAL and Mesa for copy_max_bytes. > > > > For reference, see oss2DmaCmdBuffer.cpp in PAL: > > "Due to HW limitation, the maximum count may not be 2^n-1, > > can only be 2^n - 1 - start_addr[4:2]" > > Ah! In this case the value the kernel uses is actually correct. > > The difference is that the kernel never has start_addr[4:2] != 0 for > anything larger than PAGE_SIZE while for PAL and Mesa that can > happen. > > > See also sid.h in Mesa: > > "There is apparently an undocumented HW limitation that > > prevents the HW from copying the last 255 bytes of (1 << 22) - 1" > > That is actually pretty well documented and makes perfect sense. For > unaligned start or dst addresses the SDMA needs to use an internal > bounce buffer. That's where the limit comes from. > > Not sure if we should apply that patch or not, it probably doesn't > make any difference in practice. > > > Fixes: dfe5c2b76b2a ("drm/amdgpu: Correct bytes limit for SDMA 3.0 > > copy and fill") > > Even when we apply it I think we should drop that, the value the > kernel uses is correct.
Hi Christian, The kernel and Mesa disagree on the limits for almost all SDMA versions, so it would be nice to actually understand what the limits of the SDMA HW are and use the same limit in the kernel and Mesa, or if that isn't viable, let's document why the different limits make sense. I'm adding Marek to CC, he wrote the comment that I referenced here. As far as I understand from my conversation with Marek, the kernel is actually wrong. If the limits depend on alignment, then we should either set a limit that is always safe, or make sure SDMA copies in the kernel are always aligned and add assertions about it. Looking at the implementation of amdgpu_copy_buffer in the kernel, I see that it relies on copy_max_bytes and doesn't take alignment into account, so with the current limit it could issue subsequent copies that aren't 256 byte aligned. Best regards, Timur > > Regards, > Christian. > > > Signed-off-by: Timur Kristóf <timur.kris...@gmail.com> > > --- > > drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c > > b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c > > index 1c076bd1cf73..9302cf0b5e4b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c > > @@ -1659,11 +1659,11 @@ static void > > sdma_v3_0_emit_fill_buffer(struct amdgpu_ib *ib, > > } > > > > static const struct amdgpu_buffer_funcs sdma_v3_0_buffer_funcs = { > > - .copy_max_bytes = 0x3fffe0, /* not 0x3fffff due to HW > > limitation */ > > + .copy_max_bytes = 0x3fff00, /* not 0x3fffff due to HW > > limitation */ > > .copy_num_dw = 7, > > .emit_copy_buffer = sdma_v3_0_emit_copy_buffer, > > > > - .fill_max_bytes = 0x3fffe0, /* not 0x3fffff due to HW > > limitation */ > > + .fill_max_bytes = 0x3fff00, /* not 0x3fffff due to HW > > limitation */ > > .fill_num_dw = 5, > > .emit_fill_buffer = sdma_v3_0_emit_fill_buffer, > > };