This patchset implements offchip tessellation after which we can finally process more than one patch per wave without decreasing tessmark scores.
For tessmark this improves performance by ~20% for the x32 case and ~80% for the x64 case. x8 and x16 have roughly the same performance as before. Unigine heaven gets 43 fps compared to 28 before (roughly +50%). Amdgpu-pro gets 44 fps for heaven. For Shadow of Mordor the performance changes from 28 fps to 40 fps (roughly +40%). Remaining ideas for improvement are: - Don't store TCS outputs to TCS and don't unnecessarily allocate LDS. This has pretty much no measurable effect in the games I tried. - Only store TCS outputs to memory when the tess factors exceed a threshold. I haven't been able to get the LDS case working with dynamic HS enabled, but the decompiled amdgpu-pro shaders give a very strong hint that this is possible. However amdgpu-pro sets the thresshold to -1, so pretty much always stores to memory too as far as I can see. Maybe it does not work on VI, or there is some interaction with the VI only distribution modes and these were considered more profitable. - Hardware swizzled buffers. The swizzling by hand I use results in extra VALU instructions and it would be nice if we did not need to have them. However, my attempts have not resulted in a performance improvement yet. I have run the piglit gpu suite and found no regressions on a Tonga card. Bas Nieuwenhuizen (14): radeonsi: Add buffer for offchip storage between TCS and TES. radeonsi: Add offchip tessellation parameters. radeonsi: Define build_tbuffer_store_dwords earlier to support new users. radeonsi: Add buffer load functions. radeonsi: Use correct parameter index for LS_OUT_LAYOUT. radeonsi: Add user SGPR for the layout of the offchip buffer. radeonsi: Add offchip buffer address calculation. radeonsi: Store inputs to memory when not using a TCS. radeonsi: Use buffer loads and stores for passing data from TCS to TES. radeonsi: Remove LDS layout user SGPR's from TES. radeonsi: Enable dynamic HS. radeonsi: Use barrier instructions for TCS barriers. radeonsi: Process multiple patches per threadgroup. radeonsi: Allow TES distribution between shader engines. src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.h | 1 + src/gallium/drivers/radeonsi/si_shader.c | 567 ++++++++++++++++++------ src/gallium/drivers/radeonsi/si_shader.h | 32 +- src/gallium/drivers/radeonsi/si_state.c | 5 + src/gallium/drivers/radeonsi/si_state.h | 1 + src/gallium/drivers/radeonsi/si_state_draw.c | 59 ++- src/gallium/drivers/radeonsi/si_state_shaders.c | 67 ++- 8 files changed, 560 insertions(+), 173 deletions(-) -- 2.8.2 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev