On 7/20/23 22:48, Philip Yang wrote: > On 2023-07-20 06:46, Michel Dänzer wrote: >> On 7/17/23 15:09, Michel Dänzer wrote: >>> On 5/10/23 23:23, Alex Deucher wrote: >>>> From: Philip Yang <philip.y...@amd.com> >>>> >>>> Rename smv_migrate_init to a better name kgd2kfd_init_zone_device >>>> because it setup zone devive pgmap for page migration and keep it in >>>> kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it >>>> only once in amdgpu_device_ip_init after adev ip blocks are initialized, >>>> but before amdgpu_amdkfd_device_init initialize kfd nodes which enable >>>> SVM support based on pgmap. >>>> >>>> svm_range_set_max_pages is called by kgd2kfd_device_init everytime after >>>> switching compute partition mode. >>>> >>>> Signed-off-by: Philip Yang <philip.y...@amd.com> >>>> Reviewed-by: Felix Kuehling <felix.kuehl...@amd.com> >>>> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> >>> I bisected a regression to this commit, which broke HW acceleration on this >>> ThinkPad E595 with Picasso APU. >> Actually, it doesn't seem to break HW acceleration completely. GDM >> eventually comes up with HW acceleration, it takes a long time (~30s or so) >> to start up though. >> >> Later, the same messages as described in >> https://gitlab.freedesktop.org/drm/amd/-/issues/2659 appear. >> >> Reverting this commit fixes all of the above symptoms. >> >> >> I reproduced all of the above symptoms with amd-staging-drm-next commit >> 75515acf4b60 ("i2c: nvidia-gpu: Add ACPI property to align with >> device-tree") as well. >> >> >> For full disclosure, I use these kernel command line arguments: >> >> fbcon=font:10x18 drm_kms_helper.drm_fbdev_overalloc=112 amdgpu.noretry=1 >> amdgpu.mcbp=1 > > Thanks for the issue report and full disclosure, but I am not able to > reproduce this issue, with both drm-next branch and amd-staging-drm-next > branch tip on gitlab. The test system has same device id, running Ubuntu > 22.04, latest linux-firmware-20230625.tar.gz, and same BIOS version.
FWIW, your system has PCI revision ID 0xC2, while mine has 0xC1. Also, I'm currently using linux-firmware 20230515. AFAICT there are no relevant changes in 20230625, but I'm attaching the contents of /sys/kernel/debug/dri/0/amdgpu_firmware_info just in case. > I attached full dmesg log, could you help check if there is other difference, > maybe kernel config, gcc version... it is hard to guess what could cause the > basic driver gfx ring IB test timeout. I suspect the IOMMU page faults logged in my dmesg might be relevant: amdgpu: Topology: Add APU node [0x15d8:0x1002] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x122201800 flags=0x0070] amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1125fe380 flags=0x0070] kfd kfd: amdgpu: added device 1002:15d8 There are no such page faults with the commit reverted. Other than that and the IB test failure messages, our dmesg outputs are mostly identical indeed. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
VCE feature version: 0, firmware version: 0x00000000 UVD feature version: 0, firmware version: 0x00000000 MC feature version: 0, firmware version: 0x00000000 ME feature version: 53, firmware version: 0x000000a6 PFP feature version: 53, firmware version: 0x000000c2 CE feature version: 53, firmware version: 0x00000050 RLC feature version: 1, firmware version: 0x0000006f RLC SRLC feature version: 1, firmware version: 0x00000001 RLC SRLG feature version: 1, firmware version: 0x00000001 RLC SRLS feature version: 1, firmware version: 0x00000001 RLCP feature version: 0, firmware version: 0x00000000 RLCV feature version: 0, firmware version: 0x00000000 MEC feature version: 53, firmware version: 0x000001d3 MEC2 feature version: 53, firmware version: 0x000001d3 IMU feature version: 0, firmware version: 0x00000000 SOS feature version: 0, firmware version: 0x00000000 ASD feature version: 0, firmware version: 0x21000090 TA XGMI feature version: 0x00000000, firmware version: 0x00000000 TA RAS feature version: 0x00000000, firmware version: 0x00000000 TA HDCP feature version: 0x00000000, firmware version: 0x1700002e TA DTM feature version: 0x00000000, firmware version: 0x12000012 TA RAP feature version: 0x00000000, firmware version: 0x00000000 TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x27000005 SMC feature version: 0, program: 0, firmware version: 0x00041e2a (4.30.42) SDMA0 feature version: 41, firmware version: 0x000000a9 VCN feature version: 0, firmware version: 0x0210d004 DMCU feature version: 0, firmware version: 0x00000001 DMCUB feature version: 0, firmware version: 0x00000000 TOC feature version: 0, firmware version: 0x00000000 MES_KIQ feature version: 0, firmware version: 0x00000000 MES feature version: 0, firmware version: 0x00000000 VBIOS version: 113-PICASSO-114