The following RFC patchset initially enables the resource streamer on Haswell.
We can think of the resource streamer as a command streamer accelerator: It accelerates certain commands that would normally take time to build-up and submit to the GPU; hence reducing some of the overhead associated with such commands. In Haswell, generating binding tables and constant buffers can be offloaded from being CPU-generated commands to the resource streamer. This is a preparatory patchset that initially enables hardware-generated binding tables - which is primarily required to enable RS-based optimizations e.g.constant buffer generation and other ways to reduce command buffer submissions. This initial patch is closely modeled after the current model of how the i965 driver generates binding tables (see section below for possible future optimization). Though it shaved off a few microseconds off CPU cycles for every command submission, I don't expect it at its current form to produce wide margins in performance gains. The changes improved GLB 2.5 by 0.19% n=14. In hw-generated binding tables case, the RS basically sits in front of the CS watching for the [VS/PS]BINDING_TABLE_POINTERS commands. Once RS encounters it, it flushes the state of the on-die binding table entries to a buffer object, where the CS picks it up afterwards. Each surface state and it's associated index in the on-die binding table state can be edited directly instead of generating the entire binding table array in one go. One optimization idea that we can possibly implement in the future is to use the RS to publish deltas of changed surface states so that we wouldn't have to rebuild entire binding tables for every batch buffer flush. Currently our VS/PS surface states are appended at the end of our batchbuffer in the i965 driver. For every batchbuffer flush, the VS/PS surface states and binding tables are rebuilt everytime for every change. With the RS in mind, it would be possible to use a separate larger batchbuffer for (permanent?) surface state objects so the generated surface state offsets would change less often [1]. With this series, GLB works fine and most piglit tests pass but some random GPU lockups may occur when piglit is run over a period of time. intel_error_decode does not specifically say where in the batch the problem points to. I'll spend some time in nailing down this issue in the next revision. In the intel-gfx list, I'll post the libdrm and kernel portions that enables the RS-bits on MI_BATCH_BUFFER_START. [1] Needs changes in libdrm aperture checks to accomodate multiple levels of relocation See http://lists.freedesktop.org/archives/mesa-dev/2013-May/039088.html Abdiel Janulgue (12): intel: Add resource streamer control defines intel: On Haswell hardware, enable the resource streamer on batchbuffer start i965: Temporarily disable resource streamer when state base address is updated. i965: Add MI_RS_STORE_DATA_IMM workaround for 3DPRIMITIVE commands i965: Switch on hardware-generated binding tables. i965: Implement opcodes for the hw-generated binding table EDIT commands i965: Use hw-bt for pull constants and VS UBO surface states. i965: Use hw-bt for renderbuffer, constant, and texture surface states. i965: Flush on-chip binding table to pool i965: Use hw-bt for generated WM UBO surface states. i965/blorp: In blorp, update PS on-chip binding table when new surface state entries are generated i965/blorp: Add temporary work-around due to b607d57630daa7d92a84c41abfd45cacbe63f3d2 src/mesa/drivers/dri/i965/brw_context.c | 2 ++ src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_defines.h | 9 ++++++ src/mesa/drivers/dri/i965/brw_draw.c | 14 +++++++++ src/mesa/drivers/dri/i965/brw_misc_state.c | 7 +++++ src/mesa/drivers/dri/i965/brw_state.h | 13 +++++++++ src/mesa/drivers/dri/i965/brw_state_upload.c | 3 ++ src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 14 +++++++++ src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 9 ++++++ src/mesa/drivers/dri/i965/gen6_blorp.cpp | 27 ++++++++++++++++- src/mesa/drivers/dri/i965/gen7_blorp.cpp | 3 +- src/mesa/drivers/dri/i965/gen7_misc_state.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ src/mesa/drivers/dri/i965/gen7_vs_state.c | 10 ++++--- src/mesa/drivers/dri/i965/gen7_wm_state.c | 10 ++++--- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 36 +++++++++++++++++++---- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 3 ++ src/mesa/drivers/dri/i965/intel_reg.h | 4 +++ 17 files changed, 259 insertions(+), 15 deletions(-) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev