Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

Christian König Mon, 14 Jun 2021 10:13:15 -0700

As long as we can figure out who touched to a certain sync object lastthat would indeed work, yes.


Christian.


Am 14.06.21 um 19:10 schrieb Marek Olšák:

The call to the hw scheduler has a limitation on the size of allparameters combined. I think we can only pass a 32-bit sequence numberand a ~16-bit global (per-GPU) syncobj handle in one call and not muchelse.

The syncobj handle can be an element index in a global (per-GPU)syncobj table and it's read only for all processes with the exceptionof the signal command. Syncobjs can either have per VMID write accessflags for the signal command (slow), or any process can write to anysyncobjs and only rely on the kernel checking the write log (fast).

In any case, we can execute the memory write in the queue engine andonly use the hw scheduler for logging, which would be perfect.


Marek

On Thu, Jun 10, 2021 at 12:33 PM Christian König<ckoenig.leichtzumer...@gmail.com<mailto:ckoenig.leichtzumer...@gmail.com>> wrote:


    Hi guys,

    maybe soften that a bit. Reading from the shared memory of the
    user fence is ok for everybody. What we need to take more care of
    is the writing side.

    So my current thinking is that we allow read only access, but
    writing a new sequence value needs to go through the scheduler/kernel.

    So when the CPU wants to signal a timeline fence it needs to call
    an IOCTL. When the GPU wants to signal the timeline fence it needs
    to hand that of to the hardware scheduler.

    If we lockup the kernel can check with the hardware who did the
    last write and what value was written.

    That together with an IOCTL to give out sequence number for
    implicit sync to applications should be sufficient for the kernel
    to track who is responsible if something bad happens.

    In other words when the hardware says that the shader wrote stuff
    like 0xdeadbeef 0x0 or 0xffffffff into memory we kill the process
    who did that.

    If the hardware says that seq - 1 was written fine, but seq is
    missing then the kernel blames whoever was supposed to write seq.

    Just pieping the write through a privileged instance should be
    fine to make sure that we don't run into issues.

    Christian.

    Am 10.06.21 um 17:59 schrieb Marek Olšák:

    Hi Daniel,

    We just talked about this whole topic internally and we came up
    to the conclusion that the hardware needs to understand sync
    object handles and have high-level wait and signal operations in
    the command stream. Sync objects will be backed by memory, but
    they won't be readable or writable by processes directly. The
    hardware will log all accesses to sync objects and will send the
    log to the kernel periodically. The kernel will identify
    malicious behavior.

    Example of a hardware command stream:
    ...
    ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence
    number is assigned by the kernel
    Draw();
    ImplicitSyncSignalWhenDone(syncObjHandle);
    ...

    I'm afraid we have no other choice because of the TLB
    invalidation overhead.

    Marek


    On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter <dan...@ffwll.ch
    <mailto:dan...@ffwll.ch>> wrote:

        On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian König wrote:
        > Am 09.06.21 um 15:19 schrieb Daniel Vetter:
        > > [SNIP]
        > > > Yeah, we call this the lightweight and the heavyweight
        tlb flush.
        > > >
        > > > The lighweight can be used when you are sure that you
        don't have any of the
        > > > PTEs currently in flight in the 3D/DMA engine and you
        just need to
        > > > invalidate the TLB.
        > > >
        > > > The heavyweight must be used when you need to
        invalidate the TLB *AND* make
        > > > sure that no concurrently operation moves new stuff
        into the TLB.
        > > >
        > > > The problem is for this use case we have to use the
        heavyweight one.
        > > Just for my own curiosity: So the lightweight flush is
        only for in-between
        > > CS when you know access is idle? Or does that also not
        work if userspace
        > > has a CS on a dma engine going at the same time because
        the tlb aren't
        > > isolated enough between engines?
        >
        > More or less correct, yes.
        >
        > The problem is a lightweight flush only invalidates the
        TLB, but doesn't
        > take care of entries which have been handed out to the
        different engines.
        >
        > In other words what can happen is the following:
        >
        > 1. Shader asks TLB to resolve address X.
        > 2. TLB looks into its cache and can't find address X so it
        asks the walker
        > to resolve.
        > 3. Walker comes back with result for address X and TLB puts
        that into its
        > cache and gives it to Shader.
        > 4. Shader starts doing some operation using result for
        address X.
        > 5. You send lightweight TLB invalidate and TLB throws away
        cached values for
        > address X.
        > 6. Shader happily still uses whatever the TLB gave to it in
        step 3 to
        > accesses address X
        >
        > See it like the shader has their own 1 entry L0 TLB cache
        which is not
        > affected by the lightweight flush.
        >
        > The heavyweight flush on the other hand sends out a
        broadcast signal to
        > everybody and only comes back when we are sure that an
        address is not in use
        > any more.

        Ah makes sense. On intel the shaders only operate in VA,
        everything goes
        around as explicit async messages to IO blocks. So we don't
        have this, the
        only difference in tlb flushes is between tlb flush in the IB
        and an mmio
        one which is independent for anything currently being
        executed on an
        egine.
        -Daniel

--Daniel Vetter

        Software Engineer, Intel Corporation
        http://blog.ffwll.ch <http://blog.ffwll.ch>

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

Reply via email to