Am 16.08.2018 um 18:50 schrieb Felix Kuehling:
On 2018-08-16 02:43 AM, Christian König wrote:
[SNIP]
I mean it could be that in the worst case we race and stop a KFD
process for no good reason.
Right. For a more practical example, a KFD BO can get evicted just
before the application decides to unmap it. The preemption happens
asynchronously, handled by an SDMA job in the GPU scheduler. That job
will have an amdgpu_sync object with the eviction fence in it.

While that SDMA job is pending or in progress, the application decides
to unmap the BO. That removes the eviction fence from that BO's
reservation. But it can't remove the fence from all the sync objects
that were previously created and are still in flight. So the preemption
will be triggered, and the fence will eventually signal when the KFD
preemption is complete.

I don't think that's something we can prevent. The worst case is that a
preemption happens unnecessarily if an eviction gets triggered just
before removing the fence. But removing the fence will prevent future
evictions of the BO from triggering a KFD process preemption. That's the
best we can do.

It's true that you can't drop the SDMA job which wants to evict the BO, but at this time the fence signaling is already underway and not stoppable anymore.

Replacing the fence with a new one would just be much more cleaner and fix quite a bunch of corner cases where the KFD process would be preempted without good reason.

It's probably quite a bit of more CPU overhead of doing so, but I think that this would still be the more fail prove option.

Regards,
Christian.



Regards,
   Felix


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to