This is a respin of a series I sent nearly two years ago reimplementing uniform pull constant loads in terms of constant cache block read messages instead of using sampler LD messages. The motivation is that oword block read messages are able to fetch more data with a single message than the current SIMD4x2 sampler LD messages, and they don't contribute to thrashing of the sampler caches, which can lead to performance problems with several workloads. Here is a summary of the benchmarks that are improved by this series along with an estimate of their standard deviation (see PATCH 6 for more details):
| SKL | BDW | HSW SynMark2 OglShMapPcf | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38% GfxBench4 gl_manhattan31 | 5.93% ±0.35% | 3.92% ±0.31% | 6.62% ±0.22% GfxBench4 gl_4 | 2.52% ±0.44% | 1.23% ±0.10% | N/A Unigine Valley | 0.83% ±0.17% | 0.23% ±0.05% | 0.74% ±0.45% I'm resending the series since Mark pointed out that the i965 driver leads to an increased amount of sampler traffic in comparison to the proprietary driver during some expensive draw calls of the Manhattan demo. On the other hand it would lead to a decreased (in fact zero) non-sampler shader memory access counts. The original Manhattan demo I tried two years ago wasn't affected by the change, because it didn't make use of UBOs at all, but the newer gl_manhattan31 demo based on GL 4.3/GLES 3.1 does as you can tell from the table above. The series should be roughly functionally equivalent to the last revision, but rebased two years forwards in time, which involved nearly rewriting some of the patches so I ended up making things slightly more flexible to allow the oword read block size to be specified arbitrarily by the back-end in order to allow easier future extension to use a larger block size -- Or a smaller one in order to minimize register pressure. src/mesa/drivers/dri/i965/brw_defines.h | 7 ++++++- src/mesa/drivers/dri/i965/brw_disasm.c | 1 + src/mesa/drivers/dri/i965/brw_eu.h | 1 + src/mesa/drivers/dri/i965/brw_eu_emit.c | 97 +++++++++++++++++++++++++++++++++++++++---------------------------------------------------------- src/mesa/drivers/dri/i965/brw_fs.cpp | 63 +++++++++++++++++---------------------------------------------- src/mesa/drivers/dri/i965/brw_fs.h | 5 +---- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 108 ++++++++++++++++++++++++------------------------------------------------------------------------------------ src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 19 +++++++++++-------- src/mesa/drivers/dri/i965/brw_pipe_control.c | 1 + src/mesa/drivers/dri/i965/brw_shader.cpp | 2 -- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 15 ++++++++++++--- 11 files changed, 113 insertions(+), 206 deletions(-) [PATCH 1/9] i965/gen6+: Invalidate constant cache on brw_emit_mi_flush(). [PATCH 2/9] i965: Let the caller of brw_set_dp_write/read_message control the target cache. [PATCH 3/9] i965/fs: Switch to the constant cache for uniform pull constants. [PATCH 4/9] i965: Factor out oword block read and write message control calculation. [PATCH 5/9] i965/fs: Expose arbitrary pull constant load sizes to the IR. [PATCH 6/9] i965/fs: Fetch one cacheline of pull constants at a time. [PATCH 7/9] i965/fs: Drop useless access mode override from pull constant generator code. [PATCH 8/9] i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode. [PATCH 9/9] i965/disasm: Decode dataport constant cache control fields. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev