On Sun, 2020-08-23 at 00:24 +0100, Roger Sayle wrote: > Hi Dave, > I actually think using plus_xor_ior operator is useful. It means that if > combine, > inlining or some other RTL simplification generates these variants, these > forms > will still be recognized by the backend. It's more typing, but the compiler > produces > better code. > > Here's what I have so far, but please feel free to modify anything. I'll > leave the > rest to you. > > With this patch: > > unsigned long long rotl4(unsigned long long x) > { > return (x<<4) | (x>>60); > } > > unsigned long long rotr4(unsigned long long x) > { > return (x<<60) | (x>>4); > } > > which previously generated: > > rotl4: depd,z %r26,59,60,%r28 > extrd,u %r26,3,4,%r26 > bve (%r2) > or %r26,%r28,%r28 > > rotr4: extrd,u %r26,59,60,%r28 > depd,z %r26,3,4,%r26 > bve (%r2) > or %r26,%r28,%r28 > > now produces: > > rotl4: bve (%r2) > shrpd %r26,%r26,60,%r28 > > rotr4: bve (%r2) > shrpd %r26,%r26,4,%r28 > > > I'm guessing this is very similar to what you were thinking (or what I > described previously). > > Many thanks again for trying out these patches/suggestions for me. So I put this one into the tester overnight.
It 3-stages, but trips: Tests that now fail, but worked before (5 tests): gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2 gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2 gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0 gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0 gcc.target/hppa/shadd-2.c scan-assembler-times sh.add 2 I think we've already discussed shadd-2. It's not immediately clear if the slsr failure is due to this change or something different -- the latter seems like a definite possibility as we're changing things as we leave gimple, but the test is checking the result of the gimple optimizers. Your call on how to proceed. I can put additional patches into the tester easily, so if you've got something you want investigated, don't hesitate to reach out. jeff