On Sun, May 29, 2016 at 12:01 PM, Francisco Jerez <curroje...@riseup.net> wrote:
> Francisco Jerez <curroje...@riseup.net> writes: > > > Jason Ekstrand <ja...@jlekstrand.net> writes: > > > >> On Fri, May 27, 2016 at 7:05 PM, Francisco Jerez <curroje...@riseup.net > > > >> wrote: > >> > >>> This fixes the few code quality regressions from the previous series > >>> enabling SIMD32 CS codegen in the back-end -- AFAICT by the end of the > >>> series we can finally enable GL 4.3 on all Gen8+ hardware. > >>> > >>> Patches 1-8 delay the SIMD lowering pass after the bulk of > >>> optimization passes have been run, which should decrease the > >>> compilation time of mainly SIMD32 shaders and improve the code quality > >>> of SIMD32 shaders on all generations and shaders of any dispatch width > >>> on older generations (up to and including IVB) that use SIMD lowering > >>> more intensively to implement various workarounds. > >>> > >>> Patches 9-14 rework the SIMD lowering pass to avoid emitting the copy > >>> instructions used to zip and unzip register regions where possible, > >>> since the register coalesce and copy propagation passes seem to > >>> perform rather poorly at getting rid of them in some cases. In the > >>> long term we'll likely want to improve the register coalesce pass > >>> irrespective of these changes. > >>> > >>> Patches 15-20 improve the compute-to-mrf pass used on Gen4-6 to handle > >>> cases where the source of a VGRF-to-MRF copy is initialized by the > >>> shader using multiple single-GRF writes, which becomes far more common > >>> with the additional SIMD lowering going on after this series. > >>> > >>> Patches 21-24 are some other assorted changes improving code quality > >>> on older gens. > >>> > >>> I wanted to provide more detailed (e.g. per commit) shader-db stats > >>> with this series, but kind of ran out of time. Let me know if you > >>> would like to see more evidence that any of the changes below is > >>> improving code quality in case it's not clear from the commit alone. > >>> > >>> [PATCH 01/25] i965/fs: Let CSE handle logical sampler sends as > expressions. > >>> [PATCH 02/25] i965/fs: Allow constant propagation into logical send > >>> sources. > >>> [PATCH 03/25] i965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to > >>> has_side_effects(). > >>> [PATCH 04/25] i965/fs: Run SIMD and logical send lowering after the > >>> optimization loop. > >>> [PATCH 05/25] i965/fs: Take opt_redundant_discard_jumps out of the > >>> optimization loop. > >>> [PATCH 06/25] i965/fs: Fix UB list sentinel dereference in > >>> opt_sampler_eot(). > >>> [PATCH 07/25] i965/fs: Implement opt_sampler_eot() in terms of logical > >>> sends. > >> > >> [PATCH 08/25] SQUASH: i965/fs: Add basic dataflow check to > >>> opt_sampler_eot(). > >>> > >> > >> > >>> [PATCH 09/25] i965/fs: Refactor offset() into a separate function > taking > >>> the width as argument. > >>> [PATCH 10/25] i965/fs: Generalize regions_overlap() from copy > propagation > >>> to handle non-VGRF files. > >>> [PATCH 11/25] i965/fs: Factor out region zipping and unzipping from the > >>> SIMD lowering pass. > >>> [PATCH 12/25] i965/fs: Skip SIMD lowering source unzipping for regular > >>> scalar regions. > >>> [PATCH 13/25] i965/fs: Skip SIMD lowering destination zipping if > possible. > >>> [PATCH 14/25] i965/fs: Reindent emit_zip(). > >>> > >> > >> 9-14 Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> > >> > >> > >>> [PATCH 15/25] i965/fs: Teach regions_overlap() about COMPR4 MRF > regions. > >>> [PATCH 16/25] i965/fs: Simplify and improve accuracy of > compute_to_mrf() > >>> by using regions_overlap(). > >>> [PATCH 17/25] i965/fs: Fix compute-to-mrf VGRF region coverage > condition. > >>> [PATCH 18/25] i965/fs: Refactor compute_to_mrf() to split search and > >>> rewrite into separate loops. > >>> [PATCH 19/25] i965/fs: Teach compute_to_mrf about the COMPR4 address > >>> transformation. > >>> [PATCH 20/25] i965/fs: Extend compute_to_mrf() to coalesce VGRFs > >>> initialized by multiple single-GRF writes. > >>> [PATCH 21/25] i965/fs: Extend remove_duplicate_mrf_writes() to handle > >>> non-VGRF to MRF copies. > >>> > >> > >> 15-21 scare me. A lot. They even make me think that forking the > compiler > >> between SNB and IVB may be a good idea. :-/ MRFs are annoying, but > COMPR4 > >> is such a gross hack that I really want to teach as little of the > compiler > >> about it as possible. > >> > > Heh, my impression (from writing the patches) is that COMPR4 was the > > easiest part of it, it's not entirely unlike the usual Align1 source > > regions -- with hard-wired width and vstride. I agree it's kind of a > > gross hardware feature but all it took to handle it in compute-to-mrf > > was some simple arithmetic [it would get slightly more complex if > > compute-to-mrf attempted to handle COMPR4 LOAD_PAYLOAD instructions, but > > I don't think it would necessarily be an insane idea]. > > > > Honestly, what made patches 16-20 really hard to write (and maybe also > > hard to review, I don't know), was disentangling the highly > > special-cased and dubious dataflow logic of the compute-to-mrf pass > > (patches 16 and 17 could qualify as bugfixes...), and that had little to > > Uhm, I just noticed that patch 20 is effectively a bugfix too, because > right now the compute-to-mrf pass only looks at and rewrites the *first* > def it encounters of the VGRF it's coalescing away, so it will end up > corrupting the program if there was more than one def of the VGRF region > read by the copy instruction, because the copy instruction is removed. > > I don't see any reason why this bug wouldn't be possible on master > already, but it definitely causes piglit regressions on Gen4-6 in > combination with patch 11 just because the ordering in which the zipping > instructions are emitted happens to change. I'm afraid we need to land > patches 16-20 before patch 11... > Ok, I think the conclusion of the matter is that I don't like compute_to_mrf... The patches look correct and I *think* they are strictly an improvement but compute_to_mrf looks like it will probably just fall over if it encounters a partial write. Your plan of making register_coalesce handle MRFs is probably the best long-term plan. For now, Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> I just hope we aren't opening a giant can of worms... (The can was already open) > > do with COMPR4 or even with MRFs. Compute-to-mrf is a crappy register > > coalescing pass that can only handle a small subset of the cases where > > one could possibly elide VGRF-to-MRF copies. > > > > My suspicion is that the situation would improve (both in terms of code > > generation quality on MRF platforms and codebase maintainability) if > > instead of forking the driver we got rid of the compute-to-mrf pass > > altogether and used the normal register coalesce pass to coalesce VGRFs > > into MRFs (since the one difficult part about coalescing registers is > > working out the dataflow and variable interference, COMPR4 is just the > > surface). > > > > BTW, do you have any objection against PATCH 21? It should be fully > > independent from the other MRF-related ones. > > > >> So here's the million dollar question: Do we need them? and, more > >> importantly, do we need them now? I didn't see anything wrong in my > brief > >> skimming but don't call that a review. > >> > > *Shrug*, nothing in this series is strictly necessary [heh, except of > > course PATCH 25 ;)], but if we don't include them the overall shader-db > > balance from the whole series (including parts 1-4 except for patches > > 16-20 from this series) on SNB would be: > > > > total instructions in shared programs: 5721434 -> 5755717 (0.60%) > > instructions in affected programs: 2402892 -> 2437175 (1.43%) > > helped: 1284 > > HURT: 12148 > > > > instead of the following (not excluding any changes): > > > > total instructions in shared programs: 5721284 -> 5708369 (-0.23%) > > instructions in affected programs: 489340 -> 476425 (-2.64%) > > helped: 1398 > > HURT: 21 > > > >> > >>> [PATCH 22/25] i965/fs: Fix constant combining for instructions that > cannot > >>> accept source mods. > >>> [PATCH 23/25] i965/fs: Allow scalar source regions on SNB math > >>> instructions. > >>> [PATCH 24/25] i965/fs: Skip gen4 pre/post-send dependency workaronds > for > >>> the first/last block. > >>> [PATCH 25/25] i965: Expose GL 4.3 on Gen8+. > >>> > >> > >> 22-25 are Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> > >> > > > > Thanks. > > > >> > >>> _______________________________________________ > >>> mesa-dev mailing list > >>> mesa-dev@lists.freedesktop.org > >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > >>> >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev