Re: Lockdep spalt on killing a processes

2021-11-01 Thread Andrey Grodzovsky
Pushed to drm-misc-next Andrey On 2021-10-29 3:07 a.m., Christian König wrote: Attached a patch. Give it a try please, I tested it on my side and tried to generate the right conditions to trigger this code path by repeatedly submitting commands while issuing GPU reset to stop the scheduler

Re: Lockdep spalt on killing a processes

2021-10-29 Thread Christian König
Am 28.10.21 um 19:26 schrieb Andrey Grodzovsky: On 2021-10-27 3:58 p.m., Andrey Grodzovsky wrote: On 2021-10-27 10:50 a.m., Christian König wrote: Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky: On 2021-10-27 10:34 a.m., Christian König wrote: Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky

Re: Lockdep spalt on killing a processes

2021-10-28 Thread Andrey Grodzovsky
On 2021-10-27 3:58 p.m., Andrey Grodzovsky wrote: On 2021-10-27 10:50 a.m., Christian König wrote: Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky: On 2021-10-27 10:34 a.m., Christian König wrote: Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky: [SNIP] Let me please know if I am still mis

Re: Lockdep spalt on killing a processes

2021-10-27 Thread Andrey Grodzovsky
On 2021-10-27 10:50 a.m., Christian König wrote: Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky: On 2021-10-27 10:34 a.m., Christian König wrote: Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky: [SNIP] Let me please know if I am still missing some point of yours. Well, I mean we need to

Re: Lockdep spalt on killing a processes

2021-10-27 Thread Christian König
Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky: On 2021-10-27 10:34 a.m., Christian König wrote: Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky: [SNIP] Let me please know if I am still missing some point of yours. Well, I mean we need to be able to handle this for all drivers. For sure

Re: Lockdep spalt on killing a processes

2021-10-27 Thread Andrey Grodzovsky
On 2021-10-27 10:34 a.m., Christian König wrote: Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky: [SNIP] Let me please know if I am still missing some point of yours. Well, I mean we need to be able to handle this for all drivers. For sure, but as i said above in my opinion we need to c

Re: Lockdep spalt on killing a processes

2021-10-27 Thread Christian König
Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky: [SNIP] Let me please know if I am still missing some point of yours. Well, I mean we need to be able to handle this for all drivers. For sure, but as i said above in my opinion we need to change only for those drivers that don't use the _lo

Re: Lockdep spalt on killing a processes

2021-10-27 Thread Andrey Grodzovsky
On 2021-10-26 6:54 a.m., Christian König wrote: Am 26.10.21 um 04:33 schrieb Andrey Grodzovsky: On 2021-10-25 3:56 p.m., Christian König wrote: In general I'm all there to get this fixed, but there is one major problem: Drivers don't expect the lock to be dropped. I am probably missing so

Re: Lockdep spalt on killing a processes

2021-10-26 Thread Christian König
Am 26.10.21 um 04:33 schrieb Andrey Grodzovsky: On 2021-10-25 3:56 p.m., Christian König wrote: In general I'm all there to get this fixed, but there is one major problem: Drivers don't expect the lock to be dropped. I am probably missing something but in my approach we only modify the code

Re: Lockdep spalt on killing a processes

2021-10-25 Thread Andrey Grodzovsky
On 2021-10-25 3:56 p.m., Christian König wrote: In general I'm all there to get this fixed, but there is one major problem: Drivers don't expect the lock to be dropped. I am probably missing something but in my approach we only modify the code for those clients that call dma_fence_signal, no

Re: Lockdep spalt on killing a processes

2021-10-25 Thread Christian König
In general I'm all there to get this fixed, but there is one major problem: Drivers don't expect the lock to be dropped. What we could do is to change all drivers so they call always call the dma_fence_signal functions and drop the _locked variants. This way we could move calling the callback

Re: Lockdep spalt on killing a processes

2021-10-25 Thread Andrey Grodzovsky
Adding back Daniel (somehow he got off the addresses list) and Chris who worked a lot in this area. On 2021-10-21 2:34 a.m., Christian König wrote: Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky: On 2021-10-04 4:14 a.m., Christian König wrote: The problem is a bit different. The callback

Re: Lockdep spalt on killing a processes

2021-10-20 Thread Christian König
Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky: On 2021-10-04 4:14 a.m., Christian König wrote: The problem is a bit different. The callback is on the dependent fence, while we need to signal the scheduler fence. Daniel is right that this needs an irq_work struct to handle this properly

Re: Lockdep spalt on killing a processes

2021-10-20 Thread Andrey Grodzovsky
On 2021-10-04 4:14 a.m., Christian König wrote: The problem is a bit different. The callback is on the dependent fence, while we need to signal the scheduler fence. Daniel is right that this needs an irq_work struct to handle this properly. Christian. So we had some discussions with Ch

Re: Lockdep spalt on killing a processes

2021-10-04 Thread Andrey Grodzovsky
I see my confusion, we hang all unsubmitted jobs on the last submitted to HW job. Yea, in this case indeed rescheduling to a different thread context will avoid the splat but the schedule work cannot be done for each dependency signalling but rather they way we do for ttm_bo_delayed_delete with

Re: Lockdep spalt on killing a processes

2021-10-04 Thread Christian König
The problem is a bit different. The callback is on the dependent fence, while we need to signal the scheduler fence. Daniel is right that this needs an irq_work struct to handle this properly. Christian. Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky: From what I see here you supposed to hav

Re: Lockdep spalt on killing a processes

2021-10-01 Thread Andrey Grodzovsky
From what I see here you supposed to have actual deadlock and not only warning, sched_fence->finished is  first signaled from within hw fence done callback (drm_sched_job_done_cb) but then again from within it's own callback (drm_sched_entity_kill_jobs_cb) and so looks like same fence  object is

Re: Lockdep spalt on killing a processes

2021-10-01 Thread Daniel Vetter
On Fri, Oct 01, 2021 at 12:50:35PM +0200, Christian König wrote: > Hey, Andrey. > > while investigating some memory management problems I've got the logdep > splat below. > > Looks like something is wrong with drm_sched_entity_kill_jobs_cb(), can you > investigate? Probably needs more irq_work s