Am 12.08.2016 um 17:46 schrieb Alex Deucher:
On Fri, Aug 12, 2016 at 9:52 AM, Christian König
<deathsim...@vodafone.de> wrote:
From: Christian König <christian.koe...@amd.com>
Write the PTEs at the end of the IB instead of directly into the SDMA commands.
This can save quite some CPU cycles building the entries.
Signed-off-by: Christian König <christian.koe...@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 2843132..7efcbe3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -910,15 +910,15 @@ static int amdgpu_vm_bo_update_mapping(struct
amdgpu_device *adev,
/* padding, etc. */
ndw = 64;
- if (params.src) {
+ if (src) {
/* only copy commands needed */
ndw += ncmds * 7;
- } else if (params.pages_addr) {
- /* header for write data commands */
- ndw += ncmds * 4;
+ } else if (pages_addr) {
+ /* copy commands needed */
+ ndw += ncmds * 7;
- /* body of write data command */
+ /* and also PTEs */
ndw += nptes * 2;
} else {
@@ -935,6 +935,22 @@ static int amdgpu_vm_bo_update_mapping(struct
amdgpu_device *adev,
params.ib = &job->ibs[0];
+ if (!src && pages_addr) {
+ uint64_t *pte;
+ unsigned i;
+
+ /* Put the PTEs at the end of the IB. */
+ i = ndw - nptes * 2;
+ pte= (uint64_t *)&(job->ibs->ptr[i]);
+ params.src = job->ibs->gpu_addr + i * 4;
Is the offset correct for all asics? IIRC, ndw was kind of a worst
case guess as the packet header sizes vary across families.
Yeah, that should work, but I can double check once more.
I actually don't change the dw estimation. Just instead of using the
inline write command I stitch together the page table entries at the end
of dw first and then use the copy command to move them over to the page
tables.
That has the clear advantage of being way more cache friendly, because
you don't jump around between dithings any more.
Christian.
Alex
+
+ for (i = 0; i < nptes; ++i) {
+ pte[i] = amdgpu_vm_map_gart(pages_addr, addr + i *
+ AMDGPU_GPU_PAGE_SIZE);
+ pte[i] |= flags;
+ }
+ }
+
r = amdgpu_sync_fence(adev, &job->sync, exclusive);
if (r)
goto error_free;
--
2.5.0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx