On 7/20/23 22:48, Philip Yang wrote:
> On 2023-07-20 06:46, Michel Dänzer wrote:
>> On 7/17/23 15:09, Michel Dänzer wrote:
>>> On 5/10/23 23:23, Alex Deucher wrote:
>>>> From: Philip Yang <philip.y...@amd.com>
>>>>
>>>> Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
>>>> because it setup zone devive pgmap for page migration and keep it in
>>>> kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it
>>>> only once in amdgpu_device_ip_init after adev ip blocks are initialized,
>>>> but before amdgpu_amdkfd_device_init initialize kfd nodes which enable
>>>> SVM support based on pgmap.
>>>>
>>>> svm_range_set_max_pages is called by kgd2kfd_device_init everytime after
>>>> switching compute partition mode.
>>>>
>>>> Signed-off-by: Philip Yang <philip.y...@amd.com>
>>>> Reviewed-by: Felix Kuehling <felix.kuehl...@amd.com>
>>>> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com>
>>> I bisected a regression to this commit, which broke HW acceleration on this 
>>> ThinkPad E595 with Picasso APU.
>> Actually, it doesn't seem to break HW acceleration completely. GDM 
>> eventually comes up with HW acceleration, it takes a long time (~30s or so) 
>> to start up though.
>>
>> Later, the same messages as described in 
>> https://gitlab.freedesktop.org/drm/amd/-/issues/2659 appear.
>>
>> Reverting this commit fixes all of the above symptoms.
>>
>>
>> I reproduced all of the above symptoms with amd-staging-drm-next commit 
>> 75515acf4b60 ("i2c: nvidia-gpu: Add ACPI property to align with 
>> device-tree") as well.
>>
>>
>> For full disclosure, I use these kernel command line arguments:
>>
>>  fbcon=font:10x18 drm_kms_helper.drm_fbdev_overalloc=112 amdgpu.noretry=1 
>> amdgpu.mcbp=1
> 
> Thanks for the issue report and full disclosure, but I am not able to 
> reproduce this issue, with both drm-next branch and amd-staging-drm-next 
> branch tip on gitlab. The test system has same device id, running Ubuntu 
> 22.04, latest linux-firmware-20230625.tar.gz, and same BIOS version.

FWIW, your system has PCI revision ID 0xC2, while mine has 0xC1.

Also, I'm currently using linux-firmware 20230515. AFAICT there are no relevant 
changes in 20230625, but I'm attaching the contents of 
/sys/kernel/debug/dri/0/amdgpu_firmware_info just in case.


> I attached full dmesg log, could you help check if there is other difference, 
> maybe kernel config, gcc version... it is hard to guess what could cause the 
> basic driver gfx ring IB test timeout.

I suspect the IOMMU page faults logged in my dmesg might be relevant:

 amdgpu: Topology: Add APU node [0x15d8:0x1002]
 amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 
address=0x122201800 flags=0x0070]
 amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 
address=0x1125fe380 flags=0x0070]
 kfd kfd: amdgpu: added device 1002:15d8

There are no such page faults with the commit reverted.

Other than that and the IB test failure messages, our dmesg outputs are mostly 
identical indeed.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 53, firmware version: 0x000000a6
PFP feature version: 53, firmware version: 0x000000c2
CE feature version: 53, firmware version: 0x00000050
RLC feature version: 1, firmware version: 0x0000006f
RLC SRLC feature version: 1, firmware version: 0x00000001
RLC SRLG feature version: 1, firmware version: 0x00000001
RLC SRLS feature version: 1, firmware version: 0x00000001
RLCP feature version: 0, firmware version: 0x00000000
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 53, firmware version: 0x000001d3
MEC2 feature version: 53, firmware version: 0x000001d3
IMU feature version: 0, firmware version: 0x00000000
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 0, firmware version: 0x21000090
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x1700002e
TA DTM feature version: 0x00000000, firmware version: 0x12000012
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x27000005
SMC feature version: 0, program: 0, firmware version: 0x00041e2a (4.30.42)
SDMA0 feature version: 41, firmware version: 0x000000a9
VCN feature version: 0, firmware version: 0x0210d004
DMCU feature version: 0, firmware version: 0x00000001
DMCUB feature version: 0, firmware version: 0x00000000
TOC feature version: 0, firmware version: 0x00000000
MES_KIQ feature version: 0, firmware version: 0x00000000
MES feature version: 0, firmware version: 0x00000000
VBIOS version: 113-PICASSO-114

Reply via email to