Hi Jason, On Mon, 2015-11-16 at 07:50 -0800, Jason Ekstrand wrote: > > On Nov 16, 2015 2:01 AM, "Iago Toral" <ito...@igalia.com> wrote: > > > > On Fri, 2015-11-13 at 07:34 -0800, Jason Ekstrand wrote: > > > > > > On Nov 13, 2015 5:53 AM, "Iago Toral" <ito...@igalia.com> wrote: > > > > > > > > On Wed, 2015-11-11 at 17:26 -0800, Jason Ekstrand wrote: > > > > > --- > > > > > src/mesa/drivers/dri/i965/brw_fs.cpp | 11 > > > +++++++++-- > > > > > src/mesa/drivers/dri/i965/brw_nir.c | 1 - > > > > > src/mesa/drivers/dri/i965/brw_vec4.cpp | 5 ++++- > > > > > src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 6 +++++- > > > > > 4 files changed, 18 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > > > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > > index ad94fa4..b8713ab 100644 > > > > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > > > > @@ -43,6 +43,7 @@ > > > > > #include "brw_wm.h" > > > > > #include "brw_fs.h" > > > > > #include "brw_cs.h" > > > > > +#include "brw_nir.h" > > > > > #include "brw_vec4_gs_visitor.h" > > > > > #include "brw_cfg.h" > > > > > #include "brw_dead_control_flow.h" > > > > > @@ -5459,13 +5460,16 @@ brw_compile_fs(const struct > brw_compiler > > > *compiler, void *log_data, > > > > > void *mem_ctx, > > > > > const struct brw_wm_prog_key *key, > > > > > struct brw_wm_prog_data *prog_data, > > > > > - const nir_shader *shader, > > > > > + const nir_shader *src_shader, > > > > > struct gl_program *prog, > > > > > int shader_time_index8, int > shader_time_index16, > > > > > bool use_rep_send, > > > > > unsigned *final_assembly_size, > > > > > char **error_str) > > > > > { > > > > > + nir_shader *shader = nir_shader_clone(mem_ctx, > src_shader); > > > > > + brw_postprocess_nir(shader, compiler->devinfo, true); > > > > > + > > > > > > > > Maybe it is a silly question, but why do we need to clone the > shader > > > to > > > > do this? > > > > > > Because brw_compile_foo may be called multiple times on the same > > > shader source. Since brw_postprocess_nir alters the shader source, > we > > > need to make a copy. > > > > Ok, trying to see if I get the big picture of the series: > > > > So the situation before this change is that we were running > > brw_postprocess_nir in brw_create_nir (so at link-time) before we > ran > > brw_compile_foo (at codegen/drawing time), and thus, we never had > this > > problem. We still had to fix codegen for texture rectangle when > drawing > > though, which we were doing with rescale_texcoord(). > > > > With this change, we handle texture rectangle in > brw_postprocess_nir() > > so we don't need rescale_texcoord() any more, however, this needs to > be > > done at codegen time anyway, so as a consequence, now we have to > move > > all of brw_postprocess_nir there and we have to clone the NIR > shader. > > > > Did I get it right? > > It actually happens in brw_apply_foo_key, but yes. > > > If I did then I wonder about the performance impact of this change, > > since codegen happens when we draw and there is plenty of things > > happening in brw_postprocess_nir (plus the cloning). Is it worth it? > > That's a very hard question to answer definitively. However, here are > a few data-points:
Thanks for the very detailed reply: > a) At one point we were doing all our NIR stuff on-demand. No one > complained about the performance impact. > > b) We pre-compile for the common case at link time so we will only > hit this at draw time if we actually need a recompile. Yeah, this is true. Although it looks like with this we are stepping a bit further into the performance unpredictability issues that have usually been a concern with OpenGL drivers. > c) While brw_postprocess_nir looks like it does a lot of stuff, it > calls a fixed number of mostly linear-time passes. The most expensive > is almost certainly the the out-of-SSA pass and that one is on the > order of register allocation in the back-end: > http://people.freedesktop.org/~cwabbott0/perf-shader-db-nir.svg Yeah, that could be true. > Will it have an affect? Yes. Will it be that bad? I don't think so. > It also has some advantages. In the case of texture-rectangle, it > lets us delete some fairly nasty code in the fs back-end compiler and > gives us support (without porting that nasty code) in the vec4 > back-end. In the case of texture swizzle, it lets us share code > between the backends, cleans up the backends, and gives NIR a chance > to optimize the swizzle. I think this last point is important. There > are some cases such as "if (...) { a = tex } else { a = tex }" where > we end up with a pipeline-stalling move right after both tex > operations that the FS backend has a lot of trouble getting rid of. > NIR, on the other hand, might be able to see through the phi node that > it's just a swizzling mov, reswizzle the phi, and get rid of the mov. > I haven't really looked into this problem in detail yet, but I think > it will be easier with texture swizzle out of the way. > > So, yes, I think it's worth it. Do I have hard numbers to prove it, > no. I agree that there are clear benefits. I was just curious about the performance side in this case because it looked like we were doing something that is usually discouraged, so I wanted to raise that discussion. If you feel like it is worth it I certainly have no objections :) This patch is, Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > --Jason > > > > > > > /* key->alpha_test_func means simulating alpha testing via > > > discards, > > > > > * so the shader definitely kills pixels. > > > > > */ > > > > > @@ -5618,11 +5622,14 @@ brw_compile_cs(const struct > brw_compiler > > > *compiler, void *log_data, > > > > > void *mem_ctx, > > > > > const struct brw_cs_prog_key *key, > > > > > struct brw_cs_prog_data *prog_data, > > > > > - const nir_shader *shader, > > > > > + const nir_shader *src_shader, > > > > > int shader_time_index, > > > > > unsigned *final_assembly_size, > > > > > char **error_str) > > > > > { > > > > > + nir_shader *shader = nir_shader_clone(mem_ctx, > src_shader); > > > > > + brw_postprocess_nir(shader, compiler->devinfo, true); > > > > > + > > > > > prog_data->local_size[0] = shader->info.cs.local_size[0]; > > > > > prog_data->local_size[1] = shader->info.cs.local_size[1]; > > > > > prog_data->local_size[2] = shader->info.cs.local_size[2]; > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c > > > b/src/mesa/drivers/dri/i965/brw_nir.c > > > > > index 21c2648..693b9cd 100644 > > > > > --- a/src/mesa/drivers/dri/i965/brw_nir.c > > > > > +++ b/src/mesa/drivers/dri/i965/brw_nir.c > > > > > @@ -391,7 +391,6 @@ brw_create_nir(struct brw_context *brw, > > > > > > > > > > brw_preprocess_nir(nir, is_scalar); > > > > > brw_lower_nir(nir, devinfo, shader_prog, is_scalar); > > > > > - brw_postprocess_nir(nir, devinfo, is_scalar); > > > > > > > > > > return nir; > > > > > } > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > > > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > > > > > index 8350a02..9f75bb6 100644 > > > > > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > > > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > > > > > @@ -2028,13 +2028,16 @@ brw_compile_vs(const struct > brw_compiler > > > *compiler, void *log_data, > > > > > void *mem_ctx, > > > > > const struct brw_vs_prog_key *key, > > > > > struct brw_vs_prog_data *prog_data, > > > > > - const nir_shader *shader, > > > > > + const nir_shader *src_shader, > > > > > gl_clip_plane *clip_planes, > > > > > bool use_legacy_snorm_formula, > > > > > int shader_time_index, > > > > > unsigned *final_assembly_size, > > > > > char **error_str) > > > > > { > > > > > + nir_shader *shader = nir_shader_clone(mem_ctx, > src_shader); > > > > > + brw_postprocess_nir(shader, compiler->devinfo, > > > compiler->scalar_vs); > > > > > + > > > > > const unsigned *assembly = NULL; > > > > > > > > > > unsigned nr_attributes = > > > _mesa_bitcount_64(prog_data->inputs_read); > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp > > > b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp > > > > > index 49c1083..92b15d9 100644 > > > > > --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp > > > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp > > > > > @@ -30,6 +30,7 @@ > > > > > #include "brw_vec4_gs_visitor.h" > > > > > #include "gen6_gs_visitor.h" > > > > > #include "brw_fs.h" > > > > > +#include "brw_nir.h" > > > > > > > > > > namespace brw { > > > > > > > > > > @@ -604,7 +605,7 @@ brw_compile_gs(const struct brw_compiler > > > *compiler, void *log_data, > > > > > void *mem_ctx, > > > > > const struct brw_gs_prog_key *key, > > > > > struct brw_gs_prog_data *prog_data, > > > > > - const nir_shader *shader, > > > > > + const nir_shader *src_shader, > > > > > struct gl_shader_program *shader_prog, > > > > > int shader_time_index, > > > > > unsigned *final_assembly_size, > > > > > @@ -614,6 +615,9 @@ brw_compile_gs(const struct brw_compiler > > > *compiler, void *log_data, > > > > > memset(&c, 0, sizeof(c)); > > > > > c.key = *key; > > > > > > > > > > + nir_shader *shader = nir_shader_clone(mem_ctx, > src_shader); > > > > > + brw_postprocess_nir(shader, compiler->devinfo, > > > compiler->scalar_gs); > > > > > + > > > > > prog_data->include_primitive_id = > > > > > (shader->info.inputs_read & VARYING_BIT_PRIMITIVE_ID) ! > = 0; > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev