Hi,

I am investigating one degradation related to SPEC2017 exchange2_r,
with loop vectorization on at -O2, it degraded by 6%.  By some
isolation, I found it isn't directly caused by vectorization itself,
but exposed by vectorization, some stuffs for vectorization
condition checks are hoisted out and they increase the register
pressure, finally results in more spillings than before.  If I simply
disable tree lim4, I can see the gap becomes smaller (just 40%+ of
the original), if further disable rtl lim, it just becomes to 30% of
the original.  It seems to indicate there is some room to improve in
both LIMs.

By quick scanning in tree LIM, I noticed that there seems no any
considerations on register pressure, it looked intentional? I am
wondering what's the design philosophy behind it?  Is it because that
it's hard to model register pressure well here?  If so, it seems to
put the burden onto late RA, which needs to have a good
rematerialization support.

btw, the example loop is at line 1150 from src exchange2.fppized.f90

   1150 block(rnext:9, 7, i7) = block(rnext:9, 7, i7) + 10

The extra hoisted statements after the vectorization on this loop
(cheap cost model btw) are:

    _686 = (integer(kind=8)) rnext_679;
    _1111 = (sizetype) _19;
    _1112 = _1111 * 12;
    _1927 = _1112 + 12;
  * _1895 = _1927 - _2650;
    _1113 = (unsigned long) rnext_679;
  * niters.6220_1128 = 10 - _1113;
  * _1021 = 9 - _1113;
  * bnd.6221_940 = niters.6220_1128 >> 2;
  * niters_vector_mult_vf.6222_939 = niters.6220_1128 & 18446744073709551612;
    _144 = niters_vector_mult_vf.6222_939 + _1113;
    tmp.6223_934 = (integer(kind=8)) _144;
    S.823_1004 = _1021 <= 2 ? _686 : tmp.6223_934;
  * ivtmp.6410_289 = (unsigned long) S.823_1004;

PS: * indicates the one has a long live interval.

BR,
Kewen

Reply via email to