Am 12.08.2016 um 17:46 schrieb Alex Deucher:
On Fri, Aug 12, 2016 at 9:52 AM, Christian König
<deathsim...@vodafone.de> wrote:
From: Christian König <christian.koe...@amd.com>

Write the PTEs at the end of the IB instead of directly into the SDMA commands.
This can save quite some CPU cycles building the entries.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 26 +++++++++++++++++++++-----
  1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 2843132..7efcbe3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -910,15 +910,15 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
         /* padding, etc. */
         ndw = 64;

-       if (params.src) {
+       if (src) {
                 /* only copy commands needed */
                 ndw += ncmds * 7;

-       } else if (params.pages_addr) {
-               /* header for write data commands */
-               ndw += ncmds * 4;
+       } else if (pages_addr) {
+               /* copy commands needed */
+               ndw += ncmds * 7;

-               /* body of write data command */
+               /* and also PTEs */
                 ndw += nptes * 2;

         } else {
@@ -935,6 +935,22 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,

         params.ib = &job->ibs[0];

+       if (!src && pages_addr) {
+               uint64_t *pte;
+               unsigned i;
+
+               /* Put the PTEs at the end of the IB. */
+               i = ndw - nptes * 2;
+               pte= (uint64_t *)&(job->ibs->ptr[i]);
+               params.src = job->ibs->gpu_addr + i * 4;
Is the offset correct for all asics?  IIRC, ndw was kind of a worst
case guess as the packet header sizes vary across families.

Yeah, that should work, but I can double check once more.

I actually don't change the dw estimation. Just instead of using the inline write command I stitch together the page table entries at the end of dw first and then use the copy command to move them over to the page tables.

That has the clear advantage of being way more cache friendly, because you don't jump around between dithings any more.

Christian.


Alex

+
+               for (i = 0; i < nptes; ++i) {
+                       pte[i] = amdgpu_vm_map_gart(pages_addr, addr + i *
+                                                   AMDGPU_GPU_PAGE_SIZE);
+                       pte[i] |= flags;
+               }
+       }
+
         r = amdgpu_sync_fence(adev, &job->sync, exclusive);
         if (r)
                 goto error_free;
--
2.5.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to