Hi, I am investigating one degradation related to SPEC2017 exchange2_r, with loop vectorization on at -O2, it degraded by 6%. By some isolation, I found it isn't directly caused by vectorization itself, but exposed by vectorization, some stuffs for vectorization condition checks are hoisted out and they increase the register pressure, finally results in more spillings than before. If I simply disable tree lim4, I can see the gap becomes smaller (just 40%+ of the original), if further disable rtl lim, it just becomes to 30% of the original. It seems to indicate there is some room to improve in both LIMs.
By quick scanning in tree LIM, I noticed that there seems no any considerations on register pressure, it looked intentional? I am wondering what's the design philosophy behind it? Is it because that it's hard to model register pressure well here? If so, it seems to put the burden onto late RA, which needs to have a good rematerialization support. btw, the example loop is at line 1150 from src exchange2.fppized.f90 1150 block(rnext:9, 7, i7) = block(rnext:9, 7, i7) + 10 The extra hoisted statements after the vectorization on this loop (cheap cost model btw) are: _686 = (integer(kind=8)) rnext_679; _1111 = (sizetype) _19; _1112 = _1111 * 12; _1927 = _1112 + 12; * _1895 = _1927 - _2650; _1113 = (unsigned long) rnext_679; * niters.6220_1128 = 10 - _1113; * _1021 = 9 - _1113; * bnd.6221_940 = niters.6220_1128 >> 2; * niters_vector_mult_vf.6222_939 = niters.6220_1128 & 18446744073709551612; _144 = niters_vector_mult_vf.6222_939 + _1113; tmp.6223_934 = (integer(kind=8)) _144; S.823_1004 = _1021 <= 2 ? _686 : tmp.6223_934; * ivtmp.6410_289 = (unsigned long) S.823_1004; PS: * indicates the one has a long live interval. BR, Kewen