On Fri, Mar 24, 2017 at 07:16:32PM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Fri, Mar 24, 2017 at 07:23:02PM -0400, Michael Meissner wrote: > > Reload fumbles in certain conditions. > > Yeah. And it does not need bswap to get totally lost with this, so this > patch is a workaround, not a fix. > > It does make things nicer though :-) > > > LRA generates working code, but the > > store and the load with byte reverse from the same location, can slow things > > down compared to the operation on registers. > > How so? You mean (say) lwbrx after doing a stw to that same address? > We have that problem in general :-/
With the second code sample that shows the problem, with: -O1 -mlittle -mno-lra the compiler aborts. If you remove the -mno-lra (or use -mlra on GCC 6), the compiler does not abort, however the code is sub-optimal. LRA has the value in a register it does: lwa 9,108(1) li 10,0 ori 10,10,0xc070 stdx 9,10,1 bl b ... other code li 9,0 ori 9,9,0xc070 lwbrx 9,9,1 With my patches, it does: lwa 14,108(1) bl b ... other code rotlwi 9,14,24 rlwimi 9,14,8,8,15 rlwimi 9,14,8,24,31 With -O3, it generates the same code (more or less) with both register allocators. Sure, it would be nice to fold the bswap with the original load. > > Thanks for the extensive testing. > > > Can I apply the patch to the GCC 7 trunk? Since the patch shows up as an > > abort > > in GCC 6 and not in GCC 7 (unless you disable LRA), I would like to apply > > the > > patch as soon as possible to the GCC 6 branch. > > A few remarks: > > > +; Types that have hardware load/store with byte reverse > > +(define_mode_iterator BSWAP [HI > > + SI > > + (DI "TARGET_POWERPC64 && TARGET_LDBRX")]) > > The patch would be much easier to read if you had done this step (with HSI > as well) separately. Oh well. Actually, I developed it that way, and I can easily go back to that. I thought I would be dinged because I didn't combine them. :-) > > +(define_expand "bswap<mode>2" > > > + if (MEM_P (src)) > > + emit_insn (gen_bswap<mode>2_load (dest, src)); > > + else > > + { > > + if (MEM_P (dest)) > > + emit_insn (gen_bswap<mode>2_store (dest, src)); > > + else > > + emit_insn (gen_bswap<mode>2_reg (dest, src)); > > + } > > + DONE; > > Please avoid the extra nesting, i.e. do > > + if (MEM_P (src)) > + emit_insn (gen_bswap<mode>2_load (dest, src)); > + else if (MEM_P (dest)) > + emit_insn (gen_bswap<mode>2_store (dest, src)); > + else > + emit_insn (gen_bswap<mode>2_reg (dest, src)); > + DONE; Ok. The patch originally had a force_reg in there, and I removed it when it didn't seem necessary any more. > (also for bswapdi2). > > > + [(set_attr "length" "12") > > + (set_attr "type" "*")]) > > Lose that last line? You don't need to explicitly set things to their > default value ;-) > OTOH, you might want to make it type "three" instead? > > > + [(set_attr "length" "36") > > + (set_attr "type" "*")]) > > Same. > > This patch is okay for trunk. Also for 6, but what is the hurry there? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797