Am 28.05.20 um 21:35 schrieb Marek Olšák:
On Thu, May 28, 2020 at 2:12 PM Christian König <christian.koe...@amd.com <mailto:christian.koe...@amd.com>> wrote:

    Am 28.05.20 um 18:06 schrieb Marek Olšák:
    On Thu, May 28, 2020 at 10:40 AM Christian König
    <christian.koe...@amd.com <mailto:christian.koe...@amd.com>> wrote:

        Am 28.05.20 um 12:06 schrieb Michel Dänzer:
        > On 2020-05-28 11:11 a.m., Christian König wrote:
        >> Well we still need implicit sync [...]
        > Yeah, this isn't about "we don't want implicit sync", it's
        about "amdgpu
        > doesn't ensure later jobs fully see the effects of previous
        implicitly
        > synced jobs", requiring userspace to do pessimistic flushing.

        Yes, exactly that.

        For the background: We also do this flushing for explicit
        syncs. And
        when this was implemented 2-3 years ago we first did the
        flushing for
        implicit sync as well.

        That was immediately reverted and then implemented
        differently because
        it caused severe performance problems in some use cases.

        I'm not sure of the root cause of this performance problems. My
        assumption was always that we then insert to many pipeline
        syncs, but
        Marek doesn't seem to think it could be that.

        On the one hand I'm rather keen to remove the extra handling
        and just
        always use the explicit handling for everything because it
        simplifies
        the kernel code quite a bit. On the other hand I don't want
        to run into
        this performance problem again.

        Additional to that what the kernel does is a "full" pipeline
        sync, e.g.
        we busy wait for the full hardware pipeline to drain. That
        might be
        overkill if you just want to do some flushing so that the
        next shader
        sees the stuff written, but I'm not an expert on that.


    Do we busy-wait on the CPU or in WAIT_REG_MEM?

    WAIT_REG_MEM is what UMDs do and should be faster.

    We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.

    We use this for a couple of things, especially to make sure that
    the hardware is idle before changing VMID to page table associations.

    What about your idea of having an extra dw in the shared BOs
    indicating that they are flushed?

    As far as I understand it an EOS or other event might be
    sufficient for the caches as well. And you could insert the
    WAIT_REG_MEM directly before the first draw using the texture and
    not before the whole IB.

    Could be that we can optimize this even more than what we do in
    the kernel.

    Christian.


Adding fences into BOs would be bad, because all UMDs would have to handle them.

Yeah, already assumed that this is the biggest blocker.

Is it possible to do this in the ring buffer:
if (fence_signalled) {
   indirect_buffer(dependent_IB);
   indirect_buffer(other_IB);
} else {
   indirect_buffer(other_IB);
   wait_reg_mem(fence);
   indirect_buffer(dependent_IB);
}

That's maybe possible, but at least not easily implementable.

Or we might have to wait for a hw scheduler.

I'm still fine doing the pipeline sync for implicit sync as well, I just need somebody to confirm me that this doesn't backfire in some case.


Does the kernel sync when the driver fd is different, or when the context is different?

Only when the driver fd is different.

Christian.


Marek

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to