A later bad compute job can block a good gfx job
The gang submit/barrier approach makes sure that only one application at
a time can use the gfx/compute block.
So when application B makes a compute submission while a GFX submission
of application A is still running we will wait for that GFX
The series is,
Acked-by: Luben Tuikov
We don't want the kernel to be in the business of retrying client's
requests. Instead we want the kernel to provide a conduit for such
requests to be sent, executed by the GPU, and a result returned.
If the kernel cannot process requests for any reason, e.g.
Hi, Christian
A later bad compute job can block a good gfx job, so the original TDR
design find a wrong guilty job(good gfx job).
Advanced TDR re-submits jobs in order to find the real guilty job(bad
compute job).
After reverting this commit, how does the new gang-submit promise the
isolat
This reverts commit e6c6338f393b74ac0b303d567bb918b44ae7ad75.
This feature basically re-submits one job after another to
figure out which one was the one causing a hang.
This is obviously incompatible with gang-submit which requires
that multiple jobs run at the same time. It's also absolutely
no