On 11/6/2024 8:42 PM, Alex Deucher wrote:
> On Wed, Nov 6, 2024 at 1:49 AM Victor Zhao <victor.z...@amd.com> wrote:
>>
>> From: Monk Liu <monk....@amd.com>
>>
>> As cache GTT buffer is snooped, this way the coherence between CPU write
>> and GPU fetch is guaranteed, but original code uses WC + unsnooped for
>> HIQ PQ(ring buffer) which introduces coherency issues:
>> MEC fetches a stall data from PQ and leads to MEC hang.
>
> Can you elaborate on this? I can see CPU reads being slower because
> the memory is uncached, but the ring buffer is mostly writes anyway.
> IIRC, the driver uses USWC for most if not all of the other ring
> buffers managed by the kernel. Why aren't those a problem?
We have this on other rings -
mb();
amdgpu_ring_set_wptr(ring);
I think the solution should be to use barrier before write pointer
updates rather than relying on PCIe snooping.
Thanks,
Lijo
>
> Alex
>
>>
>> Signed-off-by: Monk Liu <monk....@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> index 1f1d79ac5e6c..fb087a0ff5bc 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> @@ -779,7 +779,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>> if (amdgpu_amdkfd_alloc_gtt_mem(
>> kfd->adev, size, &kfd->gtt_mem,
>> &kfd->gtt_start_gpu_addr, &kfd->gtt_start_cpu_ptr,
>> - false, true)) {
>> + false, false)) {
>> dev_err(kfd_device, "Could not allocate %d bytes\n", size);
>> goto alloc_gtt_mem_failure;
>> }
>> --
>> 2.34.1
>>