On 2020-09-01 10:12 a.m., Alex Deucher wrote:
> On Tue, Sep 1, 2020 at 3:46 AM Pan, Xinhui <xinhui....@amd.com> wrote:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> drm_dev_alloc() alloc *dev* and set managed.final_kfree to dev to free
>> itself.
>> Now from commit 5cdd68498918("drm/amdgpu: Embed drm_device into
>> amdgpu_device (v3)") we alloc *adev* and ddev is just a member of it.
>> So drm_dev_release try to free a wrong pointer then.
>>
>> Also driver's release trys to free adev, but drm_dev_release will
>> access dev after call drvier's release.
>>
>> To fix it, remove driver's release and set managed.final_kfree to adev.
> 
> I've got to admit, the documentation around drm_dev_init is hard to
> follow.  We aren't really using the drmm stuff, but you have to use
> drmm_add_final_kfree to avoid a warning.  The logic seems to make
> sense though.
> Acked-by: Alex Deucher <alexancer.deuc...@amd.com>

The logic in patch 3/3 uses the kref infrastructure
as described in drm_drv.c's comment on what the DRM
usage is, i.e. taking advantage of the kref infrastructure.

In amdgpu_pci_probe() we call drm_dev_init() which takes
a ref of 1 on the kref in the DRM device structure,
and from then on, only when the kref transitions
from non-zero to 0, do we free the container of
DRM device, and this is beautifully shown in the
kernel stack below (please take a look at the kernel
stack below).

Using a kref is very powerful as it is implicit:
when the kref transitions from non-zero to 0,
then call the release method.

Furthermore, we own the release method, and we
like that, as it is pure, as well as,
there may be more things we'd like to do in the future
before we free the amdgpu device: maybe free memory we're
using specifically for that PCI device, maybe write
some registers, maybe notify someone or something, etc.

On another note, adding "drmm_add_final_kfree()" in the middle
of amdgpu_pci_probe() seems hackish--it's neither part
of drm_dev_init() nor of drm_dev_register(). We really
don't need it, since we rely on the kref infrastructure
to tell us when to free the device, and if you'd look
at the beautiful stack below, it knows exactly when that is,
i.e. when to free it.

The correct thing to do this is to
_leave the amdgpu_driver_release()_ alone,
remove "drmm_add_final_kfree()" and qualify
the WARN_ON() in drm_dev_register() by
the existence of drm_driver.release() (i.e. non-NULL).

I'll submit a sequence of patches to fix this right.

Regards,
Luben

> 
>>
>> [   36.269348] BUG: unable to handle page fault for address: ffffa0c279940028
>> [   36.276841] #PF: supervisor read access in kernel mode
>> [   36.282434] #PF: error_code(0x0000) - not-present page
>> [   36.288053] PGD 676601067 P4D 676601067 PUD 86a414067 PMD 86a247067 PTE 
>> 800ffff8066bf060
>> [   36.296868] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
>> [   36.302409] CPU: 4 PID: 1375 Comm: bash Tainted: G           O      
>> 5.9.0-rc2+ #46
>> [   36.310670] Hardware name: System manufacturer System Product Name/PRIME 
>> Z390-A, BIOS 1401 11/26/2019
>> [   36.320725] RIP: 0010:drm_managed_release+0x25/0x110 [drm]
>> [   36.326741] Code: 80 00 00 00 00 0f 1f 44 00 00 55 48 c7 c2 5a 9f 41 c0 
>> be 00 02 00 00 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 <48> 
>> 8b 7f 18 e8 c2 10 ff ff 4d 8b 74 24 20 49 8d 44 24 5
>> [   36.347217] RSP: 0018:ffffb9424141fce0 EFLAGS: 00010282
>> [   36.352931] RAX: 0000000000000006 RBX: ffffa0c279940010 RCX: 
>> 0000000000000006
>> [   36.360718] RDX: ffffffffc0419f5a RSI: 0000000000000200 RDI: 
>> ffffa0c279940010
>> [   36.368503] RBP: ffffb9424141fd10 R08: 0000000000000001 R09: 
>> 0000000000000001
>> [   36.376304] R10: 0000000000000000 R11: 0000000000000000 R12: 
>> ffffa0c279940010
>> [   36.384070] R13: ffffffffc0e2a000 R14: ffffa0c26924e220 R15: 
>> fffffffffffffff2
>> [   36.391845] FS:  00007fc4a277b740(0000) GS:ffffa0c288e00000(0000) 
>> knlGS:0000000000000000
>> [   36.400669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   36.406937] CR2: ffffa0c279940028 CR3: 0000000792304006 CR4: 
>> 00000000003706e0
>> [   36.414732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [   36.422550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> [   36.430354] Call Trace:
>> [   36.433044]  drm_dev_put.part.0+0x40/0x60 [drm]
>> [   36.438017]  drm_dev_put+0x13/0x20 [drm]
>> [   36.442398]  amdgpu_pci_remove+0x56/0x60 [amdgpu]
>> [   36.447528]  pci_device_remove+0x3e/0xb0
>> [   36.451807]  device_release_driver_internal+0xff/0x1d0
>> [   36.457416]  device_release_driver+0x12/0x20
>> [   36.462094]  pci_stop_bus_device+0x70/0xa0
>> [   36.466588]  pci_stop_and_remove_bus_device_locked+0x1b/0x30
>> [   36.472786]  remove_store+0x7b/0x90
>> [   36.476614]  dev_attr_store+0x17/0x30
>> [   36.480646]  sysfs_kf_write+0x4b/0x60
>> [   36.484655]  kernfs_fop_write+0xe8/0x1d0
>> [   36.488952]  vfs_write+0xf5/0x230
>> [   36.492562]  ksys_write+0x70/0xf0
>> [   36.496206]  __x64_sys_write+0x1a/0x20
>> [   36.500292]  do_syscall_64+0x38/0x90
>> [   36.504219]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Signed-off-by: xinhui pan <xinhui....@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 +---------
>>  1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index c12e9acd421d..52fc0c6c8538 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -1165,7 +1165,7 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>  if (ret)
>>  goto err_free;
>>
>> -drmm_add_final_kfree(ddev, ddev);
>> +drmm_add_final_kfree(ddev, adev);
>>
>>  if (!supports_atomic)
>>  ddev->driver_features &= ~DRIVER_ATOMIC;
>> @@ -1221,13 +1221,6 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>  drm_dev_put(dev);
>>  }
>>
>> -static void amdgpu_driver_release(struct drm_device *ddev)
>> -{
>> -struct amdgpu_device *adev = drm_to_adev(ddev);
>> -
>> -kfree(adev);
>> -}
>> -
>>  static void
>>  amdgpu_pci_shutdown(struct pci_dev *pdev)
>>  {
>> @@ -1522,7 +1515,6 @@ static struct drm_driver kms_driver = {
>>  .open = amdgpu_driver_open_kms,
>>  .postclose = amdgpu_driver_postclose_kms,
>>  .lastclose = amdgpu_driver_lastclose_kms,
>> -.release   = amdgpu_driver_release,
>>  .irq_handler = amdgpu_irq_handler,
>>  .ioctls = amdgpu_ioctls_kms,
>>  .gem_free_object_unlocked = amdgpu_gem_object_free,
>> --
>> 2.25.1
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7CLuben.Tuikov%40amd.com%7C5233b0caaac449417a7208d84e810de0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637345663485193534&amp;sdata=A%2FzJpoQmUhGBGts5mQfp%2BlVu1FL5NVbe5qfRelb2Og8%3D&amp;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to