On 02/18/2016 01:47 PM, Ian Romanick wrote: > On 02/18/2016 08:44 AM, Neil Roberts wrote: >> I made a pathological test case (attached) which repeatedly renders into >> an MSAA FBO and then blits it to the screen and measures the framerate. >> It checks it with a range of different sample counts. The rendering is >> done either by rendering two triangles to fill the framebuffer or by >> calling glClear. >> >> The percentage increase in framerate after applying the patch is like >> this: >> >> With triangles to fill buffer: >> >> 16 62,27% >> 8 48,59% >> 4 27,72% >> 2 5,34% >> 0 0,58% >> >> With glClear: >> >> 16 -5,20% >> 8 -7,08% >> 4 -2,45% >> 2 -20,76% >> 0 3,71% > > I find this result interesting. In the samples=0 case, the before and > after shaders should be identical. I dug into this a bit, and I think > the problem is the previous patch. Using do { ... } while (false) in > the macro makes the compiler generate some really, really bad code. > Just deleting those two lines makes a huge difference. > > With do { ... } while (false): > > SIMD8 shader: 228 instructions. 4 loops. 1554 cycles. 0:0 spills:fills. > Promoted 0 constants. Compacted 3648 to 2192 bytes (40%) > > Without: > > SIMD8 shader: 159 instructions. 0 loops. 380 cycles. 0:0 spills:fills. > Promoted 0 constants. Compacted 2544 to 1600 bytes (37%) > SIMD16 shader: 159 instructions. 0 loops. 898 cycles. 0:0 spills:fills. > Promoted 0 constants. Compacted 2544 to 1600 bytes (37%) > > This is for samples=16 and sampler2DMS. The cycle counts are bogus > because the 4 loops have their cycle counts statically multiplied by > 10... even though the loop will only execute once. > > So... I'll try some more experiments. I also wonder about changing the > SAMPLES > 1 to SAMPLES > 2.
This is weird, and I'm not sure what to make of it. I ran your test at a bunch of commits on my branch. Getting rid of gl_FragColor was the big win. The GL_EXT_shader_samples_identical patch is still the big loss. I'm assuming that causes enough register pressure over the previous commit to lose SIMD16. The following patch just removes the "do {" and "} while (false)". That gets back some of the losses, but not all of it. The raw per-commit data is below. 5c7f974 Android: disable unused-parameter warning With triangles to fill buffer: 8,419.603882 4,633.834045 2,595.628113 0,795.798157 With glClear: 8,450.937958 4,594.000610 2,641.354553 0,767.400818 89b8c0a meta/blit: Replace the gl_FragColor write With triangles to fill buffer: 8,597.943054 4,803.083862 2,942.862549 0,1129.432983 With glClear: 8,643.500671 4,980.392151 2,982.704407 0,1116.569946 7025d43 meta/blit: Don't dynamically select the MSAA "merge" function With triangles to fill buffer: 8,599.412598 4,954.745056 2,933.184021 0,1129.305420 With glClear: 8,582.106079 4,965.810303 2,975.419434 0,1169.180420 b895a4d meta/blit: Use a single, master shader template for MSAA-SS blits With triangles to fill buffer: 8,591.786011 4,932.835815 2,910.498047 0,1072.501099 With glClear: 8,623.441406 4,952.562378 2,941.442261 0,1081.314819 197e4be meta/blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve blit With triangles to fill buffer: 8,559.252869 4,795.355103 2,781.494202 0,941.619568 With glClear: 8,577.400513 4,806.321533 2,811.030029 0,957.579224 ec01613 fix loop nonsense derp With triangles to fill buffer: 8,559.221558 4,788.767944 2,786.287170 0,1078.632324 With glClear: 8,575.771545 4,956.205750 2,813.206482 0,1111.111084 > I will probably also make a patch to fix the damage done by the do { ... > } while (false). That's just comedy. > >> It seems like a pretty convincing win for the triangle case but the >> clear case makes it slightly worse. Presumably this is because we don't >> do anything to detect the value stored in the MCS buffer when doing a >> fast clear so the fast path isn't taken and the shader being more >> complicated makes it slower. >> >> Not sure if we want to try and do anything about that because >> potentially the cleared pixels aren't very common in a framebuffer from >> a real use case so it might not really matter. >> >> Currently we don't use SIMD16 for 16x MSAA because we can't allocate the >> registers well enough to make it worthwhile. This patch makes that >> problem a bit more interesting because even if we end up spilling a lot >> it might still be worth doing SIMD16 because the cases where the spilled >> instructions are hit would be much less common. >> >> - Neil >> >> Ian Romanick <i...@freedesktop.org> writes: >> >>> From: Ian Romanick <ian.d.roman...@intel.com> >>> >>> Somewhat surprisingly, this didn't have any affect on performance in the >>> benchmarks that Martin tried for me. >>> >>> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> >>> --- >>> src/mesa/drivers/common/meta_blit.c | 10 +++++++++- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/mesa/drivers/common/meta_blit.c >>> b/src/mesa/drivers/common/meta_blit.c >>> index 28aabd3..c0ec51f 100644 >>> --- a/src/mesa/drivers/common/meta_blit.c >>> +++ b/src/mesa/drivers/common/meta_blit.c >>> @@ -530,6 +530,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, >>> fs_source = ralloc_asprintf(mem_ctx, >>> "#version 130\n" >>> "#extension >>> GL_ARB_texture_multisample: require\n" >>> + "#extension >>> GL_EXT_shader_samples_identical: enable\n" >>> "#define gvec4 %svec4\n" >>> "uniform %ssampler2DMS%s >>> texSampler;\n" >>> "in %s texCoords;\n" >>> @@ -569,7 +570,14 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, >>> " i%s tc = i%s(texCoords);\n" >>> " int i;\n" >>> "\n" >>> - " for (i = 0; i < SAMPLES; i++)\n" >>> + " S[0] = texelFetch(texSampler, tc, >>> 0);\n" >>> + "#if >>> defined(GL_EXT_shader_samples_identical) && SAMPLES > 1\n" >>> + " if >>> (textureSamplesIdenticalEXT(texSampler, tc)) {\n" >>> + " emit2(S[0]);\n" >>> + " return;\n" >>> + " }\n" >>> + "#endif\n" >>> + " for (i = 1; i < SAMPLES; i++)\n" >>> " S[i] = texelFetch(texSampler, >>> tc, i);\n" >>> "\n" >>> " REDUCE(s16, s32);\n" >>> -- >>> 2.5.0 > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev