amdgpu: Moved fault hash table to amdgpu vm

Christian König Mon, 10 Sep 2018 02:42:57 -0700

Hi Felix & Oak,

over the weekend I had the idea that we could just use the shadow BOs tohave the current state in a page fault. They are GTT BOs and CPUaccessible anyway.


Regards,
Christian.

Am 08.09.2018 um 09:34 schrieb Christian König:

Hi Felix,

But why do you want to update page tables when there is no more user
mode context that cares about them? Is this just to allow pending work
to complete after the user mode context is gone? To prevent hangs?

The problem I'm see is that we still don't have a reliable way ofkilling a command submission while making sure that no more faults ofit are in the flight.

E.g. the HWS has a function to kill a running job on the hardware, butthat only works for compute and there is no way of telling when thatoperation has finished.

But Oak has already convinced me that we should probably work on thatinstead of trying to keep the VM alive as long as possible.


Regards,
Christian.

Am 08.09.2018 um 02:27 schrieb Felix Kuehling:

Hi Christian,

I'm not sure I get your point here.

As I understand it, the amdgpu_vm structure represents the user mode VM
context. It has the pointers to the root page directory and the rest of
the page table hierarchy. If there is no amdgpu_vm, there is no user
mode context that cares about the page tables.

If we leave the page table pointers in the amdgpu_vm, then handling
faults after the VM is destroyed is pointless. We can't find the page
tables and we can't update them, so there is nothing we can do in
response to VM faults.

So I'm guessing that you want to move the page table pointers into a
different structure that exists half-independently of the VM. It would
be created when the VM is created (that's where we currently allocate
the PASID) but can be destroyed slightly later.

But why do you want to update page tables when there is no more user
mode context that cares about them? Is this just to allow pending work
to complete after the user mode context is gone? To prevent hangs?

Regards,
   Felix

On 2018-09-10 07:20 AM, Christian König wrote:

Am I right that the handling of page fault need a valid VM?

No, exactly that view is incorrect.

The VM is meant to be a memory management structure of page tables and
is completely pointless fault processing because it represent the
future state of the page tables and not the current one.

What we need for fault processing is a new structure build around
PASIDs which is feed by the with addresses when page tables are moved
around.

Alternatively I hope that we can use the SDMA to walk the page tables
and update the required entries by just using the address.

Regards,
Christian.

Am 07.09.2018 um 16:55 schrieb Zeng, Oak:

Hi Christian,


What is the value of delay the destroy of hash to after vm is
destroyed? Since vm is destroyed, you basically don’t have enough
information to paging in the correct page to gpuvm. Am I right that
the handling of page fault need a valid VM? For example, you need the
VM to get the corresponding physical address of the faulty page.


After vm is destroyed, the retry interrupt can be directly discard as
you can’t find vm through pasid. You can think this is also one kind
of prescreen.


So I am stand for the point that, there is no need to delay the
destroy of hash to after vm is destroyed. Prescreening hash can be
destroyed at the time of vm_fini.


Thanks,

Oak


*From:*Christian König <ckoenig.leichtzumer...@gmail.com>
*Sent:* Friday, September 07, 2018 4:39 AM
*To:* Zeng, Oak <oak.z...@amd.com>; Oak Zeng <zengshan...@gmail.com>;
amd-gfx@lists.freedesktop.org
*Cc:* Zeng, Oak <oak.z...@amd.com>
*Subject:* Re: [PATCH 1/2] drm/amdgpu: Moved fault hash table to
amdgpu vm


Hi Oak,

yeah, but this is still a NAK. Making the hash per PASID was not a
suggestion but rather a must have.

The VM structures must be destroyed while the processing is still
ongoing, or otherwise we would not have a clean OOM handling.

The IDR for PASID lockup can be put into amdgpu_ids.c, you can just
replace the IDA already used there with an IDR.

Since the PASID is already freed up delayed we should have the grace
period for interrupt processing included. If that still isn't
sufficient we can still add some delayed work for this.

Regards,
Christian.

Am 07.09.2018 um 06:16 schrieb ozeng:

     Hi Christian,

     In this implementation, fault hash is made per vm, not per pasid
     as suggested, based on below considerations:

       * Delay the destroy of hash introduces more effort like how to
         set the proper grace period after which no retry interrupt
         will be generated. We want to avoid those complication if we
         can.
       * The  problem of the per vm implementation is that it is hard
         to delay the destroy of fault hash, because when vm is
         destroyed, prescreen function won't be able to retrieve the
         fault hash. But in this case, the prescreen function can
         either let the interrupt go through (to gmc) or ignore it.
         From this perspective, it is not very necessary to delay the
         destroy of hash table.
       * On the other hand, since ASICs after vega10 have the HW CAM
         feature. So the SW IV prescreen is only useful for vega10.
         For all the HW implemented CAM, we can consider the delayed
         CAM acknowledgment.
       * Making it per pasid need to introduce another idr to
         correlate pasid and hash table - where to put the idr? You
         will have to make it a global variable which is not very
         desirable.

     The main purpose of this patch is to solve the inter-process lock
     issue (when hash table is full), while still keep codes simple.

     Also with this patch, the faults kfifo is not necessary any more.
     See patch 2.

     Regards,

     Oak


     On 09/06/2018 11:28 PM, Oak Zeng wrote:

         In stead of share one fault hash table per device, make it

         per vm. This can avoid inter-process lock issue when fault

         hash table is full.


         Change-Id: I5d1281b7c41eddc8e26113e010516557588d3708

Signed-off-by: Oak Zeng <oak.z...@amd.com><mailto:oak.z...@amd.com>

Suggested-by: Christian Konig <christian.koe...@amd.com><mailto:christian.koe...@amd.com>

Suggested-by: Felix Kuehling <felix.kuehl...@amd.com><mailto:felix.kuehl...@amd.com>

---

drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 75------------------------


          drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h |  10 ----

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 102++++++++++++++++++++++++++++++++-


          drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  12 ++++

          drivers/gpu/drm/amd/amdgpu/vega10_ih.c |  38 +++++-------

          5 files changed, 127 insertions(+), 110 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.cb/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c


         index 06373d4..4ed8621 100644

         --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c

         +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c

@@ -197,78 +197,3 @@ int amdgpu_ih_process(structamdgpu_device *adev)


            return IRQ_HANDLED;

          }


         -/**

         - * amdgpu_ih_add_fault - Add a page fault record

         - *

         - * @adev: amdgpu device pointer

         - * @key: 64-bit encoding of PASID and address

         - *

- * This should be called when a retry page faultinterrupt is

- * received. If this is a new page fault, it will beadded to a hash

- * table. The return value indicates whether this is anew fault, or

- * a fault that was already known and is already beinghandled.

- *

- * If there are too many pending page faults, this willfail. Retry

- * interrupts should be ignored in this case until thereis enough


         - * free space.

         - *

- * Returns 0 if the fault was added, 1 if the fault wasalready known,


         - * -ENOSPC if there are too many pending faults.

         - */

         -int amdgpu_ih_add_fault(struct amdgpu_device *adev, u64 key)

         -{

         -  unsigned long flags;

         -  int r = -ENOSPC;

         -

         -  if (WARN_ON_ONCE(!adev->irq.ih.faults))

- /* Should be allocated in <IP>_ih_sw_init onGPUs that

- * support retry faults and require retryfiltering.


         -           */

         -          return r;

         -

         - spin_lock_irqsave(&adev->irq.ih.faults->lock, flags);

         -

- /* Only let the hash table fill up to 50% for bestperformance */

- if (adev->irq.ih.faults->count >= (1 <<(AMDGPU_PAGEFAULT_HASH_BITS-1)))


         -          goto unlock_out;

         -

- r = chash_table_copy_in(&adev->irq.ih.faults->hash,key, NULL);


         -  if (!r)

         -          adev->irq.ih.faults->count++;

         -

- /* chash_table_copy_in should never fail unless we'relosing count */


         -  WARN_ON_ONCE(r < 0);

         -

         -unlock_out:

         - spin_unlock_irqrestore(&adev->irq.ih.faults->lock, flags);

         -  return r;

         -}

         -

         -/**

         - * amdgpu_ih_clear_fault - Remove a page fault record

         - *

         - * @adev: amdgpu device pointer

         - * @key: 64-bit encoding of PASID and address

         - *

- * This should be called when a page fault has beenhandled. Any


         - * future interrupt with this key will be processed as a new

         - * page fault.

         - */

-void amdgpu_ih_clear_fault(struct amdgpu_device *adev,u64 key)


         -{

         -  unsigned long flags;

         -  int r;

         -

         -  if (!adev->irq.ih.faults)

         -          return;

         -

         - spin_lock_irqsave(&adev->irq.ih.faults->lock, flags);

         -

- r = chash_table_remove(&adev->irq.ih.faults->hash, key,NULL);


         -  if (!WARN_ON_ONCE(r < 0)) {

         -          adev->irq.ih.faults->count--;

         - WARN_ON_ONCE(adev->irq.ih.faults->count < 0);

         -  }

         -

         - spin_unlock_irqrestore(&adev->irq.ih.faults->lock, flags);

         -}

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.hb/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h


         index a23e1c0..f411ffb 100644

         --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h

         +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h

         @@ -32,13 +32,6 @@ struct amdgpu_device;

          #define AMDGPU_IH_CLIENTID_LEGACY 0

          #define AMDGPU_IH_CLIENTID_MAX SOC15_IH_CLIENTID_MAX


         -#define AMDGPU_PAGEFAULT_HASH_BITS 8

         -struct amdgpu_retryfault_hashtable {

- DECLARE_CHASH_TABLE(hash, AMDGPU_PAGEFAULT_HASH_BITS,8, 0);


         -  spinlock_t     lock;

         -  int            count;

         -};

         -

          /*

           * R6xx+ IH ring

           */

         @@ -57,7 +50,6 @@ struct amdgpu_ih_ring {

            bool                   use_doorbell;

            bool                   use_bus_addr;

dma_addr_t rb_dma_addr; /* only used whenuse_bus_addr = true */


         -  struct amdgpu_retryfault_hashtable *faults;

          };


          #define AMDGPU_IH_SRC_DATA_MAX_SIZE_DW 4

@@ -95,7 +87,5 @@ int amdgpu_ih_ring_init(structamdgpu_device *adev, unsigned ring_size,


                            bool use_bus_addr);

          void amdgpu_ih_ring_fini(struct amdgpu_device *adev);

          int amdgpu_ih_process(struct amdgpu_device *adev);

-int amdgpu_ih_add_fault(struct amdgpu_device *adev, u64key);

-void amdgpu_ih_clear_fault(struct amdgpu_device *adev,u64 key);



          #endif

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.cb/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c


         index 1d7e3c1..8b220e9 100644

         --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

         +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

@@ -2692,6 +2692,22 @@ void amdgpu_vm_adjust_size(structamdgpu_device *adev, uint32_t min_vm_size,


                     adev->vm_manager.fragment_size);

          }

+static struct amdgpu_retryfault_hashtable*init_fault_hash(void)


         +{

         +  struct amdgpu_retryfault_hashtable *fault_hash;

         +

         +  fault_hash = kmalloc(sizeof(*fault_hash), GFP_KERNEL);

         +  if (!fault_hash)

         +          return fault_hash;

         +

         +  INIT_CHASH_TABLE(fault_hash->hash,

         +                  AMDGPU_PAGEFAULT_HASH_BITS, 8, 0);

         +  spin_lock_init(&fault_hash->lock);

         +  fault_hash->count = 0;

         +

         +  return fault_hash;

         +}

         +

          /**

           * amdgpu_vm_init - initialize a vm instance

           *

@@ -2780,6 +2796,12 @@ int amdgpu_vm_init(structamdgpu_device *adev, struct amdgpu_vm *vm,


                    vm->pasid = pasid;

            }


         +  vm->fault_hash = init_fault_hash();

         +  if (!vm->fault_hash) {

         +          r = -ENOMEM;

         +          goto error_free_root;

         +  }

         +

            INIT_KFIFO(vm->faults);

            vm->fault_credit = 16;

@@ -2973,7 +2995,7 @@ void amdgpu_vm_fini(structamdgpu_device *adev, struct amdgpu_vm *vm)

/* Clear pending page faults from IH when the VM isdestroyed */


            while (kfifo_get(&vm->faults, &fault))

         -          amdgpu_ih_clear_fault(adev, fault);

         +          amdgpu_vm_clear_fault(vm->fault_hash, fault);


            if (vm->pasid) {

                    unsigned long flags;

@@ -2983,6 +3005,9 @@ void amdgpu_vm_fini(structamdgpu_device *adev, struct amdgpu_vm *vm)


spin_unlock_irqrestore(&adev->vm_manager.pasid_lock, flags);

            }


         +  kfree(vm->fault_hash);

         +  vm->fault_hash = NULL;

         +

            drm_sched_entity_destroy(&vm->entity);


            if (!RB_EMPTY_ROOT(&vm->va.rb_root)) {

@@ -3183,3 +3208,78 @@ void amdgpu_vm_set_task_info(structamdgpu_vm *vm)

+ * amdgpu_vm_add_fault - Add a page fault record to faulthash table


         + *

         + * @fault_hash: fault hash table

         + * @key: 64-bit encoding of PASID and address

         + *

+ * This should be called when a retry page faultinterrupt is

+ * received. If this is a new page fault, it will beadded to a hash

+ * table. The return value indicates whether this is anew fault, or

+ * a fault that was already known and is already beinghandled.

+ *

+ * If there are too many pending page faults, this willfail. Retry

+ * interrupts should be ignored in this case until thereis enough


         + * free space.

         + *

+ * Returns 0 if the fault was added, 1 if the fault wasalready known,


         + * -ENOSPC if there are too many pending faults.

         + */

+int amdgpu_vm_add_fault(structamdgpu_retryfault_hashtable *fault_hash, u64 key)


         +{

         +  unsigned long flags;

         +  int r = -ENOSPC;

         +

         +  if (WARN_ON_ONCE(!fault_hash))

         +          /* Should be allocated in amdgpu_vm_init

         +           */

         +          return r;

         +

         +  spin_lock_irqsave(&fault_hash->lock, flags);

         +

+ /* Only let the hash table fill up to 50% for bestperformance */

+ if (fault_hash->count >= (1 <<(AMDGPU_PAGEFAULT_HASH_BITS-1)))


         +          goto unlock_out;

         +

         +  r = chash_table_copy_in(&fault_hash->hash, key, NULL);

         +  if (!r)

         +          fault_hash->count++;

         +

+ /* chash_table_copy_in should never fail unless we'relosing count */


         +  WARN_ON_ONCE(r < 0);

         +

         +unlock_out:

         +  spin_unlock_irqrestore(&fault_hash->lock, flags);

         +  return r;

         +}

         +

         +/**

         + * amdgpu_vm_clear_fault - Remove a page fault record

         + *

         + * @fault_hash: fault hash table

         + * @key: 64-bit encoding of PASID and address

         + *

+ * This should be called when a page fault has beenhandled. Any


         + * future interrupt with this key will be processed as a new

         + * page fault.

         + */

+void amdgpu_vm_clear_fault(structamdgpu_retryfault_hashtable *fault_hash, u64 key)


         +{

         +  unsigned long flags;

         +  int r;

         +

         +  if (!fault_hash)

         +          return;

         +

         +  spin_lock_irqsave(&fault_hash->lock, flags);

         +

         +  r = chash_table_remove(&fault_hash->hash, key, NULL);

         +  if (!WARN_ON_ONCE(r < 0)) {

         +          fault_hash->count--;

         +          WARN_ON_ONCE(fault_hash->count < 0);

         +  }

         +

         +  spin_unlock_irqrestore(&fault_hash->lock, flags);

         +}

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.hb/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h


         index e275ee7..6eb1da1 100644

         --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

         +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

         @@ -178,6 +178,13 @@ struct amdgpu_task_info {

            pid_t   tgid;

          };


         +#define AMDGPU_PAGEFAULT_HASH_BITS 8

         +struct amdgpu_retryfault_hashtable {

+ DECLARE_CHASH_TABLE(hash, AMDGPU_PAGEFAULT_HASH_BITS,8, 0);


         +  spinlock_t     lock;

         +  int            count;

         +};

         +

          struct amdgpu_vm {

            /* tree of virtual addresses mapped */

            struct rb_root_cached  va;

         @@ -240,6 +247,7 @@ struct amdgpu_vm {

            struct ttm_lru_bulk_move lru_bulk_move;

            /* mark whether can do the bulk move */

            bool                   bulk_moveable;

         +  struct amdgpu_retryfault_hashtable *fault_hash;

          };


          struct amdgpu_vm_manager {

@@ -355,4 +363,8 @@ void amdgpu_vm_set_task_info(structamdgpu_vm *vm);


          void amdgpu_vm_move_to_lru_tail(struct amdgpu_device *adev,

                                   struct amdgpu_vm *vm);

+int amdgpu_vm_add_fault(structamdgpu_retryfault_hashtable *fault_hash, u64 key);

+void amdgpu_vm_clear_fault(structamdgpu_retryfault_hashtable *fault_hash, u64 key);


         +

          #endif

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.cb/drivers/gpu/drm/amd/amdgpu/vega10_ih.c


         index 5ae5ed2..acbe5a7 100644

         --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c

         +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c

@@ -265,35 +265,36 @@ static boolvega10_ih_prescreen_iv(struct amdgpu_device *adev)


                    return true;

            }


         -  addr = ((u64)(dw5 & 0xf) << 44) | ((u64)dw4 << 12);

         -  key = AMDGPU_VM_FAULT(pasid, addr);

         -  r = amdgpu_ih_add_fault(adev, key);

         -

- /* Hash table is full or the fault is already beingprocessed,


         -   * ignore further page faults

         -   */

         -  if (r != 0)

         -          goto ignore_iv;

         -

            /* Track retry faults in per-VM fault FIFO. */

            spin_lock(&adev->vm_manager.pasid_lock);

            vm = idr_find(&adev->vm_manager.pasid_idr, pasid);

         +  addr = ((u64)(dw5 & 0xf) << 44) | ((u64)dw4 << 12);

         +  key = AMDGPU_VM_FAULT(pasid, addr);

            if (!vm) {

                    /* VM not found, process it normally */

spin_unlock(&adev->vm_manager.pasid_lock);

         -          amdgpu_ih_clear_fault(adev, key);

                    return true;

         +  } else {

         +          r = amdgpu_vm_add_fault(vm->fault_hash, key);

         +

+ /* Hash table is full or the fault is alreadybeing processed,


         +           * ignore further page faults

         +           */

         +          if (r != 0) {

         + spin_unlock(&adev->vm_manager.pasid_lock);

         +                  goto ignore_iv;

         +          }

            }

/* No locking required with single writer and singlereader */


            r = kfifo_put(&vm->faults, key);

            if (!r) {

                    /* FIFO is full. Ignore it until there is space */

         +          amdgpu_vm_clear_fault(vm->fault_hash, key);

spin_unlock(&adev->vm_manager.pasid_lock);

         -          amdgpu_ih_clear_fault(adev, key);

                    goto ignore_iv;

            }

         - spin_unlock(&adev->vm_manager.pasid_lock);


         + spin_unlock(&adev->vm_manager.pasid_lock);

/* It's the first fault for this address, process itnormally */


            return true;

@@ -386,14 +387,6 @@ static int vega10_ih_sw_init(void*handle)


            adev->irq.ih.use_doorbell = true;

            adev->irq.ih.doorbell_index = AMDGPU_DOORBELL64_IH << 1;

- adev->irq.ih.faults =kmalloc(sizeof(*adev->irq.ih.faults), GFP_KERNEL);


         -  if (!adev->irq.ih.faults)

         -          return -ENOMEM;

         - INIT_CHASH_TABLE(adev->irq.ih.faults->hash,

         -                   AMDGPU_PAGEFAULT_HASH_BITS, 8, 0);

         - spin_lock_init(&adev->irq.ih.faults->lock);

         -  adev->irq.ih.faults->count = 0;

         -

            r = amdgpu_irq_init(adev);


            return r;

@@ -406,9 +399,6 @@ static int vega10_ih_sw_fini(void*handle)


            amdgpu_irq_fini(adev);

            amdgpu_ih_ring_fini(adev);


         -  kfree(adev->irq.ih.faults);

         -  adev->irq.ih.faults = NULL;

         -

            return 0;

          }






     _______________________________________________

     amd-gfx mailing list

amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>


     https://lists.freedesktop.org/mailman/listinfo/amd-gfx



_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Moved fault hash table to amdgpu vm

Reply via email to