Hi Sterling,

    Attached please find the testcase for the spill issue. Try it out with the 
patch :-)


> 
> On Wed, Oct 15, 2014 at 7:10 PM, Yangfei (Felix) <felix.y...@huawei.com>
> wrote:
> > Hi Sterling,
> >
> >     Since the patch is delayed for a long time, I'm kind of pushing it. 
> > Sorry for
> that.
> >     Yeah, you are right. We have some performance issue here as GCC may
> use one more general register in some cases with this patch.
> >     Take the following arraysum testcase for example. In doloop 
> > optimization,
> GCC figures out that the number of iterations is 1024 and creates a new pseudo
> 79 as the new trip count register.
> >     The pseudo 79 is live throughout the loop, this makes the register
> pressure in the loop higher. And it's possible that this new pseudo is 
> spilled by
> reload when the register pressure is very high.
> >     I know that the xtensa loop instruction copies the trip count register 
> > into
> the LCOUNT special register. And we need describe this hardware feature in GCC
> in order to free the trip count register.
> >     But I find it difficult to do. Do you have any good suggestions on this?
> 
> There are two issues related to the trip count, one I would like you to solve 
> now,
> one later.
> 
> 1. Later: The trip count doesn't need to be updated at all inside these 
> loops, once
> the loop instruction executes. The code below relates to this case.
> 
> 2. Now: You should be able to use a loop instruction regardless of whether the
> trip count is spilled. If you have an example where that wouldn't work, I 
> would
> love to see it.
> 
void
foo (unsigned f, long v, unsigned *w, unsigned a, unsigned b, unsigned e, 
unsigned c, unsigned d)
{
  unsigned h = v / 4, x[16];
  while (f < h)
    {
      unsigned i;
      f++;
      a |= (a >> 30);
      d = (d << 30) | ((unsigned) d >> 30);
      c = (c << 30) | ((unsigned) c >> 30);
      b = 30 | ((unsigned) b >> 30);
      d += a = (a << 30) | ((unsigned) a >> 2);
      c += ((d << 5) | (d >> 27)) + ((e & (a ^ b))) + 0x5a827999 + x[12];
      a += (c & e);
      c = 30 | ((unsigned) c);
      i = x[5] ^ x[7] ^ x[8] ^ x[3];
      x[5] = (i << 1) | ((unsigned) i >> 31);
      i = x[6] ^ x[2] ^ x[14] ^ x[13];
      x[6] = (i << 1) | (i >> 31);
      b += (c | (c >> 5)) + (d ^ e) + 0x6ed9eba1 + (x[7] = (i << 1) | 
((unsigned) i >> 31));
      x[8] = i | 1;
      e += (a | 5) + b + (i = x[9] ^ x[6], x[10] = (i << (unsigned) i));
      e = 30 | ((unsigned) e >> 30);
      i = x[12] ^ x[14] ^ x[12] ^ x[12], (x[12] = 1 | ((unsigned) i));
      i = x[13] ^ x[5] ^ x[10], (x[13] = (i << (unsigned) 1));
      i = x[2] ^ x[7] ^ x[12], (x[15] = i | ((unsigned) i >> 1));
      i = x[2] ^ x[0] ^ x[13], (x[0] = (i << 1) | 31);
      e = (e << 30) | 2;
      i = x[14] ^ x[2] ^ x[15], (x[2] = i | 1);
      x[3] = i | ((unsigned) i);
      i = x[14] ^ x[12] ^ x[4], (x[4] = 1 | ((unsigned) i >> 1));
      x[5] = i | 1;
      e = (e << 30) | 30;
      b += (5 | ((unsigned) e >> 5)) + 0x8f1bbcdc + (x[9] = (i | ((unsigned) i 
>> 1)));
      i = x[2] ^ (x [10] = ((i << 1) | (i >> 1)));
      x[13] = (i | ((unsigned) i >> 1));
      (i = x[14] ^ x[0] ^ x[14], (x[14] = ((i << 1) | 31)));
      a = *w += a;
    }
}

Reply via email to