Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-20 Thread Jeff Law
On 05/20/14 11:14, Wei Mi wrote: On Tue, May 20, 2014 at 12:13 AM, Bin.Cheng wrote: On Tue, May 20, 2014 at 1:30 AM, Jeff Law wrote: On 05/19/14 00:38, Bin.Cheng wrote: On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: On 05/16/14 04:07, Bin.Cheng wrote: But can't you go through movXX

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-20 Thread Wei Mi
On Tue, May 20, 2014 at 12:13 AM, Bin.Cheng wrote: > On Tue, May 20, 2014 at 1:30 AM, Jeff Law wrote: >> On 05/19/14 00:38, Bin.Cheng wrote: >>> >>> On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: On 05/16/14 04:07, Bin.Cheng wrote: But can't you go through movXX

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-20 Thread Jeff Law
On 05/20/14 01:13, Bin.Cheng wrote: The idea being that common cases where a pair moves can be turned into a single wider move without having to write target code to make that happen much of the time. ie 2xQI->HI, 2xHI->SI, 2xSI->DI 2xSF->DF. For things outside those simple cases, fall back to

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-20 Thread Bin.Cheng
On Tue, May 20, 2014 at 5:02 AM, Mike Stump wrote: > On May 19, 2014, at 10:30 AM, Jeff Law wrote: >>> Yes, I think it's more than upsizing the mode. There is another >>> example from one of x86's candidate peephole patch at >>> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00467.html >>> >>> Th

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-20 Thread Bin.Cheng
On Tue, May 20, 2014 at 1:30 AM, Jeff Law wrote: > On 05/19/14 00:38, Bin.Cheng wrote: >> >> On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: >>> >>> On 05/16/14 04:07, Bin.Cheng wrote: >>> >>> >>> >>> But can't you go through movXX to generate either the simple insn on the >>> ARM >>> or the PA

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-19 Thread Mike Stump
On May 19, 2014, at 10:30 AM, Jeff Law wrote: >> Yes, I think it's more than upsizing the mode. There is another >> example from one of x86's candidate peephole patch at >> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00467.html >> >> The patch wants to do below transformation, which I think is

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-19 Thread Jeff Law
On 05/19/14 00:38, Bin.Cheng wrote: On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: On 05/16/14 04:07, Bin.Cheng wrote: Yes, I think this one does have a good reason. The target independent pass just makes sure that two consecutive memory access instructions are free of data-dependency wit

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-19 Thread Jeff Law
On 05/19/14 00:38, Bin.Cheng wrote: 1) Should we do it in a separated pass, or just along with scheduler? ISTM that when we're able to combine insns that can impact the schedule we'd like to generate, possibly in significant ways. That argues for a separate pass that runs before the scheduler.

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-19 Thread Jeff Law
On 05/19/14 00:38, Bin.Cheng wrote: On Sat, May 17, 2014 at 12:52 AM, Mike Stump wrote: On May 16, 2014, at 3:07 AM, Bin.Cheng wrote: I don't see how regrename will help resolve [base+offset] false dependencies. Can you explain? I'd expect effects from hardreg-copyprop "commoning" a base re

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-18 Thread Bin.Cheng
On Sat, May 17, 2014 at 12:18 AM, Jeff Law wrote: > On 05/16/14 04:07, Bin.Cheng wrote: >> >> On Fri, May 16, 2014 at 1:13 AM, Jeff Law wrote: >>> >>> On 05/15/14 10:51, Mike Stump wrote: On May 15, 2014, at 12:26 AM, bin.cheng wrote: > > > Here comes up with a new GCC

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-18 Thread Bin.Cheng
On Sat, May 17, 2014 at 12:52 AM, Mike Stump wrote: > On May 16, 2014, at 3:07 AM, Bin.Cheng wrote: >> >>> I don't see how regrename will help resolve [base+offset] false >>> dependencies. Can you explain? I'd expect effects from >>> hardreg-copyprop "commoning" a base register. >> It's the regis

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-18 Thread Bin.Cheng
On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: > On 05/16/14 04:07, Bin.Cheng wrote: > >> Yes, I think this one does have a good reason. The target independent >> pass just makes sure that two consecutive memory access instructions >> are free of data-dependency with each other, then feeds it

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Oleg Endo
On Fri, 2014-05-16 at 18:10 +0800, Bin.Cheng wrote: > On Thu, May 15, 2014 at 6:31 PM, Oleg Endo wrote: > > > > How about the following. > > Instead of adding new hooks and inserting the pass to the general pass > > list, make the new > > pass class take the necessary callback functions directly.

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Mike Stump
On May 16, 2014, at 3:07 AM, Bin.Cheng wrote: > >> I don't see how regrename will help resolve [base+offset] false >> dependencies. Can you explain? I'd expect effects from >> hardreg-copyprop "commoning" a base register. > It's the register operand's false dependency, rather than the base's > on

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Jeff Law
On 05/16/14 04:07, Bin.Cheng wrote: Yes, I think this one does have a good reason. The target independent pass just makes sure that two consecutive memory access instructions are free of data-dependency with each other, then feeds it to back-end hook. It's back-end's responsibility to generate

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Jeff Law
On 05/16/14 04:07, Bin.Cheng wrote: On Fri, May 16, 2014 at 1:13 AM, Jeff Law wrote: On 05/15/14 10:51, Mike Stump wrote: On May 15, 2014, at 12:26 AM, bin.cheng wrote: Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjace

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Ramana Radhakrishnan
> > Btw, the bswap pass enhancements that are currently in review may > also be an opportunity to catch these. They can merge adjacent > loads that are used "composed" (but not yet composed by storing > into adjacent memory). The basic-block vectorizer should also > handle this (if the compositio

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Steven Bosscher
On Fri, May 16, 2014 at 12:51 PM, Richard Biener wrote: > Btw, the bswap pass enhancements that are currently in review may > also be an opportunity to catch these. They can merge adjacent > loads that are used "composed" (but not yet composed by storing > into adjacent memory). The basic-block v

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 12:10 PM, Bin.Cheng wrote: > On Thu, May 15, 2014 at 6:31 PM, Oleg Endo wrote: >> Hi, >> >> On 15 May 2014, at 09:26, "bin.cheng" wrote: >> >>> Hi, >>> Targets like ARM and AARCH64 support double-word load store instructions, >>> and these instructions are generally faste

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng
On Thu, May 15, 2014 at 6:31 PM, Oleg Endo wrote: > Hi, > > On 15 May 2014, at 09:26, "bin.cheng" wrote: > >> Hi, >> Targets like ARM and AARCH64 support double-word load store instructions, >> and these instructions are generally faster than the corresponding two >> load/stores. GCC currently u

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng
On Fri, May 16, 2014 at 12:57 AM, Steven Bosscher wrote: > On Thu, May 15, 2014 at 9:26 AM, bin.cheng wrote: >> Hi, >> Targets like ARM and AARCH64 support double-word load store instructions, >> and these instructions are generally faster than the corresponding two >> load/stores. GCC currently

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng
On Fri, May 16, 2014 at 1:13 AM, Jeff Law wrote: > On 05/15/14 10:51, Mike Stump wrote: >> >> On May 15, 2014, at 12:26 AM, bin.cheng wrote: >>> >>> Here comes up with a new GCC pass looking through each basic block >>> and merging paired load store even they are not adjacent to each >>> other. >

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng
On Fri, May 16, 2014 at 12:51 AM, Mike Stump wrote: > On May 15, 2014, at 12:26 AM, bin.cheng wrote: >> Here comes up with a new GCC pass looking through each basic block and >> merging paired load store even they are not adjacent to each other. > > So I have a target that has load and store mult

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Mike Stump
On May 15, 2014, at 1:01 PM, Jeff Law wrote: > For the memory optimizations, IIRC, the dependencies keep them from getting > into the ready queue at the same time. Thus it's significantly harder to get > them to issue consecutively when you've got an issue rate > 1. > But if you've got an issu

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Jeff Law
On 05/15/14 12:41, Mike Stump wrote: On May 15, 2014, at 10:13 AM, Jeff Law wrote: I've poked at the scheduler several times to do similar stuff, but was never really satisfied with the results and never tried to polish those prototypes into something worth submitting. What was lacking? The

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Mike Stump
On May 15, 2014, at 10:13 AM, Jeff Law wrote: > I've poked at the scheduler several times to do similar stuff, but was never > really satisfied with the results and never tried to polish those prototypes > into something worth submitting. What was lacking? The cleanliness of the patch or the,

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread H.J. Lu
On Thu, May 15, 2014 at 12:26 AM, bin.cheng wrote: > Hi, > Targets like ARM and AARCH64 support double-word load store instructions, > and these instructions are generally faster than the corresponding two > load/stores. GCC currently uses peephole2 to merge paired load/store into > one single in

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Jeff Law
On 05/15/14 10:51, Mike Stump wrote: On May 15, 2014, at 12:26 AM, bin.cheng wrote: Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjacent to each other. So I have a target that has load and store multiple support that suppo

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Steven Bosscher
On Thu, May 15, 2014 at 9:26 AM, bin.cheng wrote: > Hi, > Targets like ARM and AARCH64 support double-word load store instructions, > and these instructions are generally faster than the corresponding two > load/stores. GCC currently uses peephole2 to merge paired load/store into > one single inst

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Mike Stump
On May 15, 2014, at 12:26 AM, bin.cheng wrote: > Here comes up with a new GCC pass looking through each basic block and > merging paired load store even they are not adjacent to each other. So I have a target that has load and store multiple support that supports large a number of registers (2-n

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Oleg Endo
Hi, On 15 May 2014, at 09:26, "bin.cheng" wrote: > Hi, > Targets like ARM and AARCH64 support double-word load store instructions, > and these instructions are generally faster than the corresponding two > load/stores. GCC currently uses peephole2 to merge paired load/store into > one single in

[GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread bin.cheng
Hi, Targets like ARM and AARCH64 support double-word load store instructions, and these instructions are generally faster than the corresponding two load/stores. GCC currently uses peephole2 to merge paired load/store into one single instruction which has a disadvantage. It can only handle simple