On Wed, Dec 13, 2017 at 11:21 PM, Iago Toral <ito...@igalia.com> wrote: > On Tue, 2017-12-12 at 08:20 +0100, Iago Toral wrote: > > On Mon, 2017-12-11 at 08:01 -0800, Jason Ekstrand wrote: > > On Mon, Dec 11, 2017 at 12:55 AM, Iago Toral <ito...@igalia.com> wrote: > > This didn't get any reviews yet. Any takers? > > On Fri, 2017-12-01 at 13:46 +0100, Iago Toral Quiroga wrote: >> Otherwise loop unrolling will fail to see the actual cost of >> the unrolling operations when the loop body contains 64-bit integer >> instructions, and very specially when the divmod64 lowering applies, >> since its lowering is quite expensive. >> >> Without this change, some in-development CTS tests for int64 >> get stuck forever trying to register allocate a shader with >> over 50K SSA values. The large number of SSA values is the result >> of NIR first unrolling multiple seemingly simple loops that involve >> int64 instructions, only to then lower these instructions to produce >> a massive pile of code (due to the divmod64 lowering in the unrolled >> instructions). >> >> With this change, loop unrolling will see the loops with the int64 >> code already lowered and will realize that it is too expensive to >> unroll. > > > Hrm... I'm not quite sure what I think of this. I put it after nir_optimize > because I wanted opt_algebraic to be able to work it's magic and hopefully > remove a bunch of int64 ops before we lower them. In particular, we have > optimizations to remove integer division and replace it with shifts. > However, loop unrolling does need to happen before lower_indirect_derefs so > that lower_indirect_derefs will do as little work as possible. > > This is a bit of a pickle... I don't really want to add a third > brw_nir_optimize call. It probably wouldn't be the end of the world but it > does add compile time. > > One crazy idea which I don't think I like would be to have a quick pass that > walks the IR and sees if there are any 64-bit SSA values. If it does, we > run brw_nir_optimize without loop unrolling then 64-bit lowering and then we > go into the normal brw_nir_optimize. > > > With the constraints you mention above, I am not sure that we have many more > options... what if we always run opt_algebraic first followed by int64 > lowering before the first nir_optimize? That would only add an extra > opt_algebraic instead of a full nir_optimize. Would that be better than > adding that 64-bit SSA scan pre-pass? > > > We still need to make a decision for this, does my proposal sound better > than than the other options on the table? If not I guess we should go with > the 64-bit SSA scan pre-pass.
Realized I never responded to this -- sorry. Yes, I think your proposal sounds good. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev