Chris Wilson <ch...@chris-wilson.co.uk> writes: > On Sat, Oct 03, 2015 at 05:57:05PM +0300, Francisco Jerez wrote: >> Jordan Justen <jordan.l.jus...@intel.com> writes: >> >> > From: Francisco Jerez <curroje...@riseup.net> >> > >> > Fixes >> > arb_shader_image_load_store/execution/load-from-cleared-image.shader_test >> > >> > Cc: Chris Wilson <ch...@chris-wilson.co.uk> >> > Cc: Jason Ekstrand <jason.ekstr...@intel.com> >> > Tested-by: Jordan Justen <jordan.l.jus...@intel.com> >> > --- >> > RE: i965: Perform an explicit flush after doing _mesa_meta_pbo_TexSubImage >> > >> > curro has some concerns about potential perf impact by this and >> > wanted it to be checked on small-core w/CPU bound apps. >> > Unfortunately, he is on vacation now. >> >> I've benchmarked this on VLV and none of the CPU-bound tests in the >> Finnish benchmarking system regress significantly, with n=6 and 95% >> confidence level, so s/RFC/PATCH/. I'll CC mesa-stable so it probably >> makes sense to keep this independent from Chris' VBO resolve series. > > I ran patch this on bsw (and repeated it afresh just to be sure) using > synmark:Ogl*: > > 6994ca2 glsl: fix whitespace > synmark:OglBatch3: 277.02 (+0.00%): min/p50/90/95/99/max/std = > 271.902 / 277.024 / 277.952 / 278.117 / 278.233 / 278.409 / 1.27475 n=30 > synmark:OglBatch3:cpu: 434.92 (+0.00%): min/p50/90/95/99/max/std = > 429.869 / 434.755 / 437.818 / 438.236 / 439.022 / 439.251 / 2.19871 n=30 > synmark:OglBatch4: 154.76 (+0.00%): min/p50/90/95/99/max/std = > 153.394 / 154.75 / 155.636 / 155.643 / 156.089 / 156.172 / 0.721397 n=30 > synmark:OglBatch4:cpu: 176.84 (+0.00%): min/p50/90/95/99/max/std = > 176.239 / 176.838 / 177.068 / 177.394 / 177.451 / 177.46 / 0.273547 n=30 > synmark:OglBatch5: 46.59 (+0.00%): min/p50/90/95/99/max/std = > 45.9918 / 46.6053 / 46.819 / 46.842 / 46.8459 / 46.8538 / 0.26363 n=30 > synmark:OglBatch5:cpu: 52.79 (+0.00%): min/p50/90/95/99/max/std = > 52.1812 / 52.6714 / 53.3544 / 53.3605 / 53.3726 / 53.4059 / 0.402148 n=30 > synmark:OglBatch6: 11.95 (+0.00%): min/p50/90/95/99/max/std = > 11.7449 / 11.9523 / 12.0025 / 12.0026 / 12.0097 / 12.0304 / 0.0771611 n=30 > synmark:OglBatch6:cpu: 14.17 (+0.00%): min/p50/90/95/99/max/std = > 14.0292 / 14.169 / 14.1863 / 14.1889 / 14.1963 / 14.1999 / 0.0387371 n=30 > synmark:OglBatch7: 3.04 (+0.00%): min/p50/90/95/99/max/std = > 3.00939 / 3.03578 / 3.05513 / 3.05555 / 3.05582 / 3.05847 / 0.0148493 n=30 > synmark:OglBatch7:cpu: 3.66 (+0.00%): min/p50/90/95/99/max/std = > 3.63355 / 3.66219 / 3.66685 / 3.66706 / 3.66789 / 3.66943 / 0.00683119 n=30 > Patched > synmark:OglBatch3: 276.06 (-0.35%): min/p50/90/95/99/max/std = > 269.608 / 276.098 / 277.25 / 277.354 / 277.967 / 278.013 / 2.0933 n=30 > synmark:OglBatch3:cpu: 415.49 (-4.47%): min/p50/90/95/99/max/std = > 412.316 / 415.554 / 417.828 / 417.9 / 418.471 / 419.22 / 1.84629 n=30 > synmark:OglBatch4: 144.26 (-6.78%): min/p50/90/95/99/max/std = > 143.126 / 144.188 / 145.026 / 145.114 / 145.126 / 145.356 / 0.527859 n=30 > synmark:OglBatch4:cpu: 161.82 (-8.49%): min/p50/90/95/99/max/std = > 161.247 / 161.82 / 162.12 / 162.169 / 162.172 / 162.222 / 0.254633 n=30 > synmark:OglBatch5: 42.44 (-8.91%): min/p50/90/95/99/max/std = > 41.856 / 42.4856 / 42.7209 / 42.8424 / 42.8436 / 42.8441 / 0.287101 n=30 > synmark:OglBatch5:cpu: 47.90 (-9.27%): min/p50/90/95/99/max/std = > 47.4268 / 47.7758 / 48.4164 / 48.4775 / 48.5086 / 48.5284 / 0.341355 n=30 > synmark:OglBatch6: 10.86 (-9.09%): min/p50/90/95/99/max/std = > 10.7564 / 10.8818 / 10.9238 / 10.926 / 10.9279 / 10.9535 / 0.0619808 n=30 > synmark:OglBatch6:cpu: 12.80 (-9.62%): min/p50/90/95/99/max/std = > 12.7149 / 12.8064 / 12.8179 / 12.8228 / 12.8249 / 12.8255 / 0.0235037 n=30 > synmark:OglBatch7: 2.76 (-9.02%): min/p50/90/95/99/max/std = > 2.74078 / 2.7634 / 2.77936 / 2.78025 / 2.78254 / 2.7827 / 0.0126239 n=30 > synmark:OglBatch7:cpu: 3.29 (-10.12%): min/p50/90/95/99/max/std = > 3.26676 / 3.29127 / 3.29445 / 3.29464 / 3.29472 / 3.29616 / 0.0072781 n=30 > > > 6994ca2 glsl: fix whitespace > synmark:OglBatch3: 276.90 (+0.00%): min/p50/90/95/99/max/std = > 274.104 / 276.81 / 277.697 / 278.063 / 278.067 / 278.505 / 0.914328 n=30 > synmark:OglBatch3:cpu: 434.96 (+0.00%): min/p50/90/95/99/max/std = > 429.492 / 434.784 / 437.174 / 437.482 / 437.548 / 439.812 / 2.09205 n=30 > synmark:OglBatch4: 154.06 (+0.00%): min/p50/90/95/99/max/std = > 152.336 / 153.995 / 155.37 / 155.446 / 155.544 / 155.636 / 0.919322 n=30 > synmark:OglBatch4:cpu: 176.45 (+0.00%): min/p50/90/95/99/max/std = > 175.959 / 176.435 / 176.686 / 176.718 / 176.892 / 176.9 / 0.247188 n=30 > synmark:OglBatch5: 45.88 (+0.00%): min/p50/90/95/99/max/std = > 45.2706 / 45.7631 / 46.6576 / 46.6662 / 46.6929 / 46.7339 / 0.485474 n=30 > synmark:OglBatch5:cpu: 52.65 (+0.00%): min/p50/90/95/99/max/std = > 52.025 / 52.5863 / 53.0849 / 53.0974 / 53.1337 / 53.1717 / 0.306497 n=30 > synmark:OglBatch6: 11.88 (+0.00%): min/p50/90/95/99/max/std = > 11.7448 / 11.8725 / 11.9216 / 11.9337 / 11.935 / 11.9444 / 0.0534648 n=30 > synmark:OglBatch6:cpu: 14.13 (+0.00%): min/p50/90/95/99/max/std = > 14.0528 / 14.1313 / 14.1533 / 14.1584 / 14.1585 / 14.1721 / 0.0227182 n=30 > synmark:OglBatch7: 3.02 (+0.00%): min/p50/90/95/99/max/std = > 2.99852 / 3.02145 / 3.04142 / 3.04273 / 3.04422 / 3.04584 / 0.0141605 n=30 > synmark:OglBatch7:cpu: 3.66 (+0.00%): min/p50/90/95/99/max/std = > 3.65238 / 3.66158 / 3.66565 / 3.66662 / 3.66722 / 3.66726 / 0.00423393 n=30 > Patched > synmark:OglBatch3: 275.73 (-0.42%): min/p50/90/95/99/max/std = > 269.989 / 275.696 / 277.249 / 277.262 / 277.306 / 277.577 / 1.59707 n=30 > synmark:OglBatch3:cpu: 412.83 (-5.09%): min/p50/90/95/99/max/std = > 410.283 / 412.828 / 415.52 / 415.523 / 416.087 / 416.527 / 1.66123 n=30 > synmark:OglBatch4: 144.29 (-6.34%): min/p50/90/95/99/max/std = > 142.661 / 144.363 / 145.052 / 145.105 / 145.147 / 145.276 / 0.763492 n=30 > synmark:OglBatch4:cpu: 161.53 (-8.45%): min/p50/90/95/99/max/std = > 160.928 / 161.522 / 161.847 / 162.003 / 162.048 / 162.116 / 0.263058 n=30 > synmark:OglBatch5: 41.75 (-9.01%): min/p50/90/95/99/max/std = > 41.3497 / 41.7404 / 41.9044 / 41.9791 / 42.0902 / 42.1262 / 0.166136 n=30 > synmark:OglBatch5:cpu: 47.91 (-9.02%): min/p50/90/95/99/max/std = > 47.53 / 47.8143 / 48.3434 / 48.3721 / 48.4509 / 48.4631 / 0.292788 n=30 > synmark:OglBatch6: 10.89 (-8.27%): min/p50/90/95/99/max/std = > 10.7444 / 10.8922 / 10.9323 / 10.9384 / 10.9477 / 10.9482 / 0.0535149 n=30 > synmark:OglBatch6:cpu: 12.78 (-9.56%): min/p50/90/95/99/max/std = > 12.6762 / 12.7798 / 12.7961 / 12.7967 / 12.7981 / 12.8059 / 0.0355064 n=30 > synmark:OglBatch7: 2.76 (-8.65%): min/p50/90/95/99/max/std = > 2.74121 / 2.76015 / 2.77461 / 2.77479 / 2.77892 / 2.77992 / 0.0111074 n=30 > synmark:OglBatch7:cpu: 3.29 (-10.07%): min/p50/90/95/99/max/std = > 3.26976 / 3.29293 / 3.29758 / 3.29878 / 3.29917 / 3.29936 / 0.00586701 n=30 > > nothing else stood from the noise. (The cpu variants are with INTEL_NO_HW=1.) > -Chris
I don't see anything like that on VLV even after going up to n=30. What compiler options did you use to build mesa? Does the attached change have any effect on the results? Do you see a comparable regression after moving the image resolves to the VBO hook you added in your other series? BTW please send any absolute BSW FPS results to me in private rather than to the public mailing list... > > -- > Chris Wilson, Intel Open Source Technology Centre
diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index b6b8262..50788fd 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -191,17 +191,20 @@ intel_update_state(struct gl_context * ctx, GLuint new_state) /* Resolve color for each active shader image. */ for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { - const struct gl_shader *shader = ctx->_Shader->CurrentProgram[i] ? - ctx->_Shader->CurrentProgram[i]->_LinkedShaders[i] : NULL; + const struct gl_shader_program *prog = ctx->_Shader->CurrentProgram[i]; - if (unlikely(shader && shader->NumImages)) { - for (unsigned j = 0; j < shader->NumImages; j++) { - struct gl_image_unit *u = &ctx->ImageUnits[shader->ImageUnits[j]]; - tex_obj = intel_texture_object(u->TexObj); + if (unlikely(prog)) { + const struct gl_shader *shader = prog->_LinkedShaders[i]; - if (tex_obj && tex_obj->mt) { - intel_miptree_resolve_color(brw, tex_obj->mt); - brw_render_cache_set_check_flush(brw, tex_obj->mt->bo); + if (unlikely(shader && shader->NumImages)) { + for (unsigned j = 0; j < shader->NumImages; j++) { + struct gl_image_unit *u = &ctx->ImageUnits[shader->ImageUnits[j]]; + tex_obj = intel_texture_object(u->TexObj); + + if (tex_obj && tex_obj->mt) { + intel_miptree_resolve_color(brw, tex_obj->mt); + brw_render_cache_set_check_flush(brw, tex_obj->mt->bo); + } } } }
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev