Hi Sterling, Attached please find the testcase for the spill issue. Try it out with the patch :-)
> > On Wed, Oct 15, 2014 at 7:10 PM, Yangfei (Felix) <felix.y...@huawei.com> > wrote: > > Hi Sterling, > > > > Since the patch is delayed for a long time, I'm kind of pushing it. > > Sorry for > that. > > Yeah, you are right. We have some performance issue here as GCC may > use one more general register in some cases with this patch. > > Take the following arraysum testcase for example. In doloop > > optimization, > GCC figures out that the number of iterations is 1024 and creates a new pseudo > 79 as the new trip count register. > > The pseudo 79 is live throughout the loop, this makes the register > pressure in the loop higher. And it's possible that this new pseudo is > spilled by > reload when the register pressure is very high. > > I know that the xtensa loop instruction copies the trip count register > > into > the LCOUNT special register. And we need describe this hardware feature in GCC > in order to free the trip count register. > > But I find it difficult to do. Do you have any good suggestions on this? > > There are two issues related to the trip count, one I would like you to solve > now, > one later. > > 1. Later: The trip count doesn't need to be updated at all inside these > loops, once > the loop instruction executes. The code below relates to this case. > > 2. Now: You should be able to use a loop instruction regardless of whether the > trip count is spilled. If you have an example where that wouldn't work, I > would > love to see it. >
void foo (unsigned f, long v, unsigned *w, unsigned a, unsigned b, unsigned e, unsigned c, unsigned d) { unsigned h = v / 4, x[16]; while (f < h) { unsigned i; f++; a |= (a >> 30); d = (d << 30) | ((unsigned) d >> 30); c = (c << 30) | ((unsigned) c >> 30); b = 30 | ((unsigned) b >> 30); d += a = (a << 30) | ((unsigned) a >> 2); c += ((d << 5) | (d >> 27)) + ((e & (a ^ b))) + 0x5a827999 + x[12]; a += (c & e); c = 30 | ((unsigned) c); i = x[5] ^ x[7] ^ x[8] ^ x[3]; x[5] = (i << 1) | ((unsigned) i >> 31); i = x[6] ^ x[2] ^ x[14] ^ x[13]; x[6] = (i << 1) | (i >> 31); b += (c | (c >> 5)) + (d ^ e) + 0x6ed9eba1 + (x[7] = (i << 1) | ((unsigned) i >> 31)); x[8] = i | 1; e += (a | 5) + b + (i = x[9] ^ x[6], x[10] = (i << (unsigned) i)); e = 30 | ((unsigned) e >> 30); i = x[12] ^ x[14] ^ x[12] ^ x[12], (x[12] = 1 | ((unsigned) i)); i = x[13] ^ x[5] ^ x[10], (x[13] = (i << (unsigned) 1)); i = x[2] ^ x[7] ^ x[12], (x[15] = i | ((unsigned) i >> 1)); i = x[2] ^ x[0] ^ x[13], (x[0] = (i << 1) | 31); e = (e << 30) | 2; i = x[14] ^ x[2] ^ x[15], (x[2] = i | 1); x[3] = i | ((unsigned) i); i = x[14] ^ x[12] ^ x[4], (x[4] = 1 | ((unsigned) i >> 1)); x[5] = i | 1; e = (e << 30) | 30; b += (5 | ((unsigned) e >> 5)) + 0x8f1bbcdc + (x[9] = (i | ((unsigned) i >> 1))); i = x[2] ^ (x [10] = ((i << 1) | (i >> 1))); x[13] = (i | ((unsigned) i >> 1)); (i = x[14] ^ x[0] ^ x[14], (x[14] = ((i << 1) | 31))); a = *w += a; } }