On Wed, Jan 06, 2010 at 04:18:06PM +0100, Jakub Jelinek wrote: > On Wed, Jan 06, 2010 at 10:15:58AM +0000, Andrew Haley wrote: > > On 01/06/2010 09:59 AM, Mark Colby wrote: > > >>>> Yabbut, how come RTL cse can handle it in x86_64, but PPC not? > > >>> > > >>> Probably because the RTL on x86_64 uses and's and ior's, but PPC uses > > >>> set's of zero_extract's (insvsi). > > >> > > >> Aha! Yes, that'll probably be it. It should be easy to fix cse to > > >> recognize those too. > > > > > I'm not familiar with the gcc source yet, but just in case I get the > > > time to look at this, could anyone give me a file/line ref to dive > > > into and examine? > > > > Would you believe cse.c? :-) > > > > I can't find the line without investigating further. > > > > Andrew. > > > > P.S. This is a nontrivial task if you don't know gcc, but might be a > > good place for a beginner to start. OTOH, might be hard: no way to > > know without digging. > > I've digged a little bit and this optimizes the testcase on PowerPC 32-bit. > The patch is completely untested though. > > On PowerPC 64-bit which apparently doesn't use ZERO_EXTRACT in this case I > see a different issue. It generates > li 3,0 > ori 3,3,32820 > sldi 3,3,16 > while IMHO 2 insns to load the constant would be completely sufficient,
Indeed. > apparently rs6000_emit_set_long_const needs work. > lis 3,0x8034 > extsw 3,3 > or > li 3,0x401a > sldi 3,3,17 > etc. do IMHO the same. Huh? I don't think so: - first one loads 0xffff_ffff_8034_0000 in r3, and the extsw looks redundant - second ones ends up with 0x0000_0000_8034_0000 in r3, and looks optimal. Gabriel