On Wed, Jan 06, 2010 at 04:18:06PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 06, 2010 at 10:15:58AM +0000, Andrew Haley wrote:
> > On 01/06/2010 09:59 AM, Mark Colby wrote:
> > >>>> Yabbut, how come RTL cse can handle it in x86_64, but PPC not?
> > >>>
> > >>> Probably because the RTL on x86_64 uses and's and ior's, but PPC uses
> > >>> set's of zero_extract's (insvsi).
> > >>
> > >> Aha!  Yes, that'll probably be it.  It should be easy to fix cse to
> > >> recognize those too.
> > 
> > > I'm not familiar with the gcc source yet, but just in case I get the
> > > time to look at this, could anyone give me a file/line ref to dive
> > > into and examine?
> > 
> > Would you believe cse.c?  :-)
> > 
> > I can't find the line without investigating further.
> > 
> > Andrew.
> > 
> > P.S.  This is a nontrivial task if you don't know gcc, but might be a
> > good place for a beginner to start.  OTOH, might be hard: no way to
> > know without digging.
> 
> I've digged a little bit and this optimizes the testcase on PowerPC 32-bit.
> The patch is completely untested though.
> 
> On PowerPC 64-bit which apparently doesn't use ZERO_EXTRACT in this case I
> see a different issue.  It generates
>         li 3,0
>         ori 3,3,32820
>         sldi 3,3,16
> while IMHO 2 insns to load the constant would be completely sufficient,

Indeed.

> apparently rs6000_emit_set_long_const needs work.
>       lis 3,0x8034
>       extsw 3,3
> or
>       li 3,0x401a
>       sldi 3,3,17
> etc. do IMHO the same.

Huh? I don't think so:

- first one loads 0xffff_ffff_8034_0000 in r3, and the extsw looks redundant
  
- second ones ends up with 0x0000_0000_8034_0000 in r3, and looks optimal. 

        Gabriel

Reply via email to