On Sun, 2018-02-04 at 14:40 -0500, Connor Abbott wrote: > On Mon, Dec 11, 2017 at 11:01 AM, Jason Ekstrand <ja...@jlekstrand.ne > t> wrote: > > On Mon, Dec 11, 2017 at 12:55 AM, Iago Toral <ito...@igalia.com> > > wrote: > > > > > > This didn't get any reviews yet. Any takers? > > > > > > On Fri, 2017-12-01 at 13:46 +0100, Iago Toral Quiroga wrote: > > > > Otherwise loop unrolling will fail to see the actual cost of > > > > the unrolling operations when the loop body contains 64-bit > > > > integer > > > > instructions, and very specially when the divmod64 lowering > > > > applies, > > > > since its lowering is quite expensive. > > > > > > > > Without this change, some in-development CTS tests for int64 > > > > get stuck forever trying to register allocate a shader with > > > > over 50K SSA values. The large number of SSA values is the > > > > result > > > > of NIR first unrolling multiple seemingly simple loops that > > > > involve > > > > int64 instructions, only to then lower these instructions to > > > > produce > > > > a massive pile of code (due to the divmod64 lowering in the > > > > unrolled > > > > instructions). > > > > > > > > With this change, loop unrolling will see the loops with the > > > > int64 > > > > code already lowered and will realize that it is too expensive > > > > to > > > > unroll. > > > > > > Hrm... I'm not quite sure what I think of this. I put it after > > nir_optimize > > because I wanted opt_algebraic to be able to work it's magic and > > hopefully > > remove a bunch of int64 ops before we lower them. In particular, > > we have > > optimizations to remove integer division and replace it with > > shifts. > > However, loop unrolling does need to happen before > > lower_indirect_derefs so > > that lower_indirect_derefs will do as little work as possible. > > > > This is a bit of a pickle... I don't really want to add a third > > brw_nir_optimize call. It probably wouldn't be the end of the > > world but it > > does add compile time. > > > > One crazy idea which I don't think I like would be to have a quick > > pass that > > walks the IR and sees if there are any 64-bit SSA values. If it > > does, we > > run brw_nir_optimize without loop unrolling then 64-bit lowering > > and then we > > go into the normal brw_nir_optimize. > > > > --Jason > > Why don't we just add some sort of backend-specific code-size metric > to the loop unrolling, rather than just counting NIR instructions? > i.e. something like a num_assembly_instructions(nir_instr *) function > pointer in nir_shader_compiler_options. The root of the problem is > that that different NIR instructions can turn into vastly different > numbers of assembly instructions, but we really care about the > latter, > so the cutoff isn't doing its job of avoiding code-size blowup. As > far > as I'm aware, this is what most other compilers (e.g. LLVM) do to > solve this problem.
Yeah, I think that makes sense and sounds like something we would like to have at some point. Iago > > > > > > > > > --- > > > > src/intel/compiler/brw_nir.c | 8 ++++---- > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/src/intel/compiler/brw_nir.c > > > > b/src/intel/compiler/brw_nir.c > > > > index 8f3f77f89a..ef12cdfff8 100644 > > > > --- a/src/intel/compiler/brw_nir.c > > > > +++ b/src/intel/compiler/brw_nir.c > > > > @@ -636,6 +636,10 @@ brw_preprocess_nir(const struct > > > > brw_compiler > > > > *compiler, nir_shader *nir) > > > > > > > > OPT(nir_split_var_copies); > > > > > > > > + nir_lower_int64(nir, nir_lower_imul64 | > > > > + nir_lower_isign64 | > > > > + nir_lower_divmod64); > > > > + > > > > nir = brw_nir_optimize(nir, compiler, is_scalar); > > > > > > > > if (is_scalar) { > > > > @@ -663,10 +667,6 @@ brw_preprocess_nir(const struct > > > > brw_compiler > > > > *compiler, nir_shader *nir) > > > > brw_nir_no_indirect_mask(compiler, nir->info.stage); > > > > nir_lower_indirect_derefs(nir, indirect_mask); > > > > > > > > - nir_lower_int64(nir, nir_lower_imul64 | > > > > - nir_lower_isign64 | > > > > - nir_lower_divmod64); > > > > - > > > > /* Get rid of split copies */ > > > > nir = brw_nir_optimize(nir, compiler, is_scalar); > > > > > > > > > > _______________________________________________ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev