On Tue, 2025-09-09 at 17:21 +0200, Christian König wrote:
> On 09.09.25 16:49, Timur Kristóf wrote:
> > SDMA v3-v5 can copy almost 4 MiB in a single copy operation.
> > Use the     same value as PAL and Mesa for copy_max_bytes.
> > 
> > For reference, see oss2DmaCmdBuffer.cpp     in PAL:
> > "Due to HW limitation, the maximum count may not be 2^n-1,
> > can only be 2^n - 1 - start_addr[4:2]"
> 
> Ah! In this case the value the kernel uses is actually correct.
> 
> The difference is that the kernel never has start_addr[4:2] != 0 for
> anything larger than PAGE_SIZE while for PAL and Mesa that can
> happen.
> 
> > See also sid.h in Mesa:
> > "There is apparently an undocumented HW limitation that
> > prevents the HW from copying the last 255 bytes of (1 << 22) - 1"
> 
> That is actually pretty well documented and makes perfect sense. For
> unaligned start or dst addresses the SDMA needs to use an internal
> bounce buffer. That's where the limit comes from.
> 
> Not sure if we should apply that patch or not, it probably doesn't
> make any difference in practice.
> 
> > Fixes: dfe5c2b76b2a ("drm/amdgpu: Correct bytes limit for SDMA 3.0
> > copy and fill")
> 
> Even when we apply it I think we should drop that, the value the
> kernel uses is correct.

Hi Christian,

The kernel and Mesa disagree on the limits for almost all SDMA
versions, so it would be nice to actually understand what the limits of
the SDMA HW are and use the same limit in the kernel and Mesa, or if
that isn't viable, let's document why the different limits make sense.

I'm adding Marek to CC, he wrote the comment that I referenced here.
As far as I understand from my conversation with Marek, the kernel is
actually wrong.

If the limits depend on alignment, then we should either set a limit
that is always safe, or make sure SDMA copies in the kernel are always
aligned and add assertions about it. Looking at the implementation of
amdgpu_copy_buffer in the kernel, I see that it relies on
copy_max_bytes and doesn't take alignment into account, so with the
current limit it could issue subsequent copies that aren't 256 byte
aligned.

Best regards,
Timur


> 
> Regards,
> Christian.
> 
> > Signed-off-by: Timur Kristóf <timur.kris...@gmail.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> > b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> > index 1c076bd1cf73..9302cf0b5e4b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> > @@ -1659,11 +1659,11 @@ static void
> > sdma_v3_0_emit_fill_buffer(struct amdgpu_ib *ib,
> >  }
> >  
> >  static const struct amdgpu_buffer_funcs sdma_v3_0_buffer_funcs = {
> > -   .copy_max_bytes = 0x3fffe0, /* not 0x3fffff due to HW
> > limitation */
> > +   .copy_max_bytes = 0x3fff00, /* not 0x3fffff due to HW
> > limitation */
> >     .copy_num_dw = 7,
> >     .emit_copy_buffer = sdma_v3_0_emit_copy_buffer,
> >  
> > -   .fill_max_bytes = 0x3fffe0, /* not 0x3fffff due to HW
> > limitation */
> > +   .fill_max_bytes = 0x3fff00, /* not 0x3fffff due to HW
> > limitation */
> >     .fill_num_dw = 5,
> >     .emit_fill_buffer = sdma_v3_0_emit_fill_buffer,
> >  };

Reply via email to