在 2019/7/10 3:26, Kuehling, Felix 写道: > On 2019-07-09 8:58 a.m., Zhou, David(ChunMing) wrote: >> I've raised it up when Christian make page fault, at that patch, >> amdgpu_job_submit_direct uses exclusive page fault ring for that. >> >> But if you use amdgpu_job_submit_direct for gerneral rings ocuppied by >> scheduler, I guess varias bugs will happen. > The problem is, even the paging ring is used by the scheduler. There are > several places where buffer operations are submitted to the paging ring > through the scheduler. That makes any use of the paging ring through > direct submission problematic. > > Even ignoring the scheduler, if it's possible that multiple threads > submit to the paging ring, we'll need locking to ensure that the > contents of the ring remain consistent. IIRC, the rings used to have > locking before we had a GPU scheduler. For comparison, see > radeon_ring.c, which still has locking. With the GPU scheduler, the > rings became single-producer queues that no longer needed locking. But > with direct submission that is no longer true. I think a good place to > do that locking now would be in amdgpu_ib_schedule.
Yes, That is exact reason why we remove ring lock at that moment. You can add back it when using submit_direct co-existing with scheduler. -David > > Regards, > Felix > > >> -David >> >> 在 2019/7/9 12:53, Kuehling, Felix 写道: >>> I'm seeing some weird intermittent bugs (vm faults, hangs, etc) when >>> trying to use amdgpu_job_submit_direct. I'm wondering if there is a >>> possibility of a race condition, when a submit_direct and a GPU >>> scheduler thread try to submit to the same ring at the same time. I >>> didn't see any locking to allow multiple threads safely submitting to >>> the same ring. >>> >>> Am I missing something? >>> >>> Thanks, >>> Felix >>> _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx