On Fri, Apr 15, 2016 at 7:50 PM, Grigori Goronzy <g...@chown.ath.cx> wrote: > Small IBs help to reduce stalls for workloads that require a lot of > synchronization. On the other hand, if there is no notable > synchronization, we can use a large IB size to slightly improve > performance in some cases. > > This introduces tuning of the IB size based on feedback on the average > buffer wait time. The average wait time is tracked with exponential > smoothing. > --- > src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 2 ++ > src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 8 ++++++-- > src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h | 1 + > 3 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c > b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c > index 036301e..1e441e5 100644 > --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c > +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c > @@ -195,6 +195,7 @@ static void *amdgpu_bo_map(struct pb_buffer *buf, > return NULL; > } > } > + bo->ws->buffer_wait_time_avg = (3 * bo->ws->buffer_wait_time_avg) / > 4; > } else { > uint64_t time = os_time_get_nano(); > > @@ -222,6 +223,7 @@ static void *amdgpu_bo_map(struct pb_buffer *buf, > } > > bo->ws->buffer_wait_time += os_time_get_nano() - time; > + bo->ws->buffer_wait_time_avg = (3 * bo->ws->buffer_wait_time_avg + > os_time_get_nano() - time) / 4; > } > } > > diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c > b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c > index 3ea0f3d..a9af0ce 100644 > --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c > +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c > @@ -201,12 +201,16 @@ amdgpu_ctx_query_reset_status(struct radeon_winsys_ctx > *rwctx) > static bool amdgpu_get_new_ib(struct radeon_winsys *ws, struct amdgpu_ib *ib, > struct amdgpu_cs_ib_info *info, unsigned > ib_type) > { > + unsigned buffer_size = 128 * 1024 * 4; > + unsigned ib_size = 32 * 1024 * 4; > + > /* Small IBs are better than big IBs, because the GPU goes idle quicker > * and there is less waiting for buffers and fences. Proof: > * http://www.phoronix.com/scan.php?page=article&item=mesa-111-si&num=1 > */ > - unsigned buffer_size = 128 * 1024 * 4; > - unsigned ib_size = 20 * 1024 * 4; > + uint64_t avg = ((struct amdgpu_winsys *)ws)->buffer_wait_time_avg; > + if (avg > 1E4) > + ib_size = 10 * 1024 * 4;
Some comment here wouldn't hurt. Also that comparison could use an integer constant. (1e4 is double I think) I like the idea. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev