Am 18.07.2014 11:44, schrieb Michel D?nzer: > On 18.07.2014 15:56, Christian K?nig wrote: >> From: Christian K?nig <christian.koenig at amd.com> >> >> v3: completely rewritten. We now just remember which areas >> of the PT to clear and do so on the next command submission. >> >> Bug: https://bugs.freedesktop.org/show_bug.cgi?id=79980 >> >> Signed-off-by: Christian K?nig <christian.koenig at amd.com> > [...] > >> diff --git a/drivers/gpu/drm/radeon/radeon_vm.c >> b/drivers/gpu/drm/radeon/radeon_vm.c >> index eecff6b..2726b46 100644 >> --- a/drivers/gpu/drm/radeon/radeon_vm.c >> +++ b/drivers/gpu/drm/radeon/radeon_vm.c >> @@ -468,6 +469,15 @@ int radeon_vm_bo_set_addr(struct radeon_device *rdev, >> head = &tmp->vm_list; >> } >> >> + if (bo_va->soffset) { >> + /* add a clone of the bo_va to clear the old address */ >> + tmp = kzalloc(sizeof(struct radeon_bo_va), GFP_KERNEL); >> + tmp->soffset = bo_va->soffset; >> + tmp->eoffset = bo_va->eoffset; >> + tmp->vm = vm; > + INIT_LIST_HEAD(&tmp->vm_status); > > ?
Unnecessary, would be overwritten by the following list_add anyway. >> + list_add(&tmp->vm_status, &vm->freed); >> + } > [...] > >> -int radeon_vm_bo_rmv(struct radeon_device *rdev, >> - struct radeon_bo_va *bo_va) >> +void radeon_vm_bo_rmv(struct radeon_device *rdev, >> + struct radeon_bo_va *bo_va) >> { >> - int r = 0; >> + struct radeon_vm *vm = bo_va->vm; >> >> - mutex_lock(&bo_va->vm->mutex); >> - if (bo_va->soffset) >> - r = radeon_vm_bo_update(rdev, bo_va->vm, bo_va->bo, NULL); >> + list_del(&bo_va->bo_list); >> >> + mutex_lock(&vm->mutex); >> list_del(&bo_va->vm_list); >> - mutex_unlock(&bo_va->vm->mutex); >> - list_del(&bo_va->bo_list); > Was there any particular reason for moving the list_del(&bo_va->bo_list) > outside of the VM mutex? I suspect this might be the cause of the > problem below, which I encountered after a few piglit runs. Ah! Well it's not the root cause of the problem cause we released that lock before as well. But it changed the timing quite a bit and so brought the problem to the surface. The root cause is that we don't reserve the IB-BO on every command submission, and so a concurrent creating or tear-down of a VM could modify the bo_list without holding a lock. Thanks for the Info, going to address this in a second patch. Regards, Christian. > > > Other than those two issues, the patch looks good to me, and I haven't > seen any piglit GPU lockups with it on my Bonaire yet. > > > general protection fault: 0000 [#1] SMP 153 > Modules linked in: bnep bluetooth rfkill snd_hrtimer snd_seq snd_seq_device > cpufreq_stats cpufreq_powersave cpufreq_userspace cpufreq_conservative > binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc > hid_generic usbhid hid snd_hda_codec_hdmi evdev snd_hda_codec_realtek > snd_hda_codec_generic nls_utf8 nls_cp437 vfat fat kvm_amd kvm crc32_pclmul > crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul > glue_helper ablk_helper cryptd microcode ohci_pci psmouse serio_raw pcspkr > edac_mce_amd k10temp edac_core r8169 mii snd_hda_intel snd_hda_controller > snd_hda_codec snd_hwdep snd_pcm snd_timer radeon(O) ttm snd sg drm_kms_helper > i2c_piix4 ohci_hcd ehci_pci drm ehci_hcd soundcore i2c_algo_bit i2c_core > acpi_cpufreq xhci_hcd video usbcore usb_common processor wmi button > thermal_sys fuse autofs4 ext4 crc16 mbcache jbd2 dm_mod sd_mod crc_t10dif > crct10dif_generic crct10dif_pclmul crct10dif_common ahci libahci libata > scsi_mod > CPU: 2 PID: 14721 Comm: shader_runner Tainted: G O 3.16.0-rc5+ > #143 > Hardware name: System manufacturer System Product Name/A88X-PRO, BIOS 1001 > 04/01/2014 > task: ffff8800083e4b10 ti: ffff880103bb8000 task.ti: ffff880103bb8000 > RIP: 0010:[<ffffffffc0573672>] [<ffffffffc0573672>] > radeon_vm_bo_find+0x13/0x22 [radeon] > RSP: 0018:ffff880103bbbbf0 EFLAGS: 00010297 > RAX: dead000000100100 RBX: ffff8800dba24000 RCX: ef7bdef7bdef7bdf > RDX: 000000000000a7b3 RSI: ffff8802120a9bb8 RDI: ffff8801f1d4f800 > RBP: ffff880103bbbbf0 R08: 0000000000000000 R09: ffffffff00000000 > R10: ffff880103bbba80 R11: ffff8800dba25eb0 R12: ffff880103bbbc18 > R13: ffff8801f1d4f800 R14: ffff8800dba24000 R15: 0000000000000000 > FS: 00007f640e4b7700(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f35bc858000 CR3: 00000001024a1000 CR4: 00000000000407e0 > Stack: > ffff880103bbbdc0 ffffffffc04fb879 ffff8801f1d4f848 ffff880103bbbdf0 > ffff8800dba24018 ffff880215228098 ffff8800dba24000 ffff880003207c00 > 0000000000000003 ffff8801f9c399c0 ffff8801fa08a120 0000001700000000 > Call Trace: > [<ffffffffc04fb879>] radeon_cs_ioctl+0x4eb/0x677 [radeon] > [<ffffffffc042638e>] drm_ioctl+0x366/0x42c [drm] > [...] > >