https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 26 May 2016, andre.simoesdiasvieira at arm dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237 > > --- Comment #1 from Andre Vieira <andre.simoesdiasvieira at arm dot com> --- > So yes disabling LIM will make the tests "PASS". Though I couldnt find an > option to do this, I disabled the pass by changing passes.def, so that doesnt > sound like a good idea to test SCCP. -fno-tree-loop-im > However, it might be good to point out that at least for arm-none-eabi and > x86_64-pc-linux-gnu these tests are no longer testing SCCP, SCCP will not > change this code. I looked at the dumps and compared assembly of -O2 with and > without '-fno-tree-scev-cprop'. Yeah, so we're likely looking at bit-rotten SCCP tests ... :/ > On arm-none-eabi, it used to be IVOPTS that made the test pass, it would > reuse the same ivtmp for computing the address used by the memory > dereference and the a_p assignment. Now due to the reordering of LIM, it > will no longer do this. > > On x86_64 I see the following code coming out of the OPTIMIZED dump for the > scev-4.c case: > > ... > <bb 4>: > # ivtmp.10_14 = PHI <_24(3), ivtmp.10_25(4)> > i_11 = (int) ivtmp.10_14; > MEM[symbol: a, index: ivtmp.10_14, step: 8, offset: 4B] = 100; > ivtmp.10_25 = ivtmp.10_14 + _24; > i_22 = (int) ivtmp.10_25; > if (i_22 <= 999) > goto <bb 4>; > else > goto <bb 5>; > > <bb 5>: > _2 = (sizetype) i_11; > _3 = _2 * 8; > _10 = _3 + 4; > _1 = &a + _10; > a_p = _1; > ... > > Now yes the scan-times &a will pass, but thats because the MEM is using > symbol:a instead of base: &a. Not sure this can be qualified as a proper PASS. > Disabling LIM here the same way I did before, that is removing the pass_lim > after pass_laddress and before pass_split_crit_edges generates the following > OPTIMIZED dump: > > ... > <bb 4>: > _16 = (sizetype) k_4(D); > _15 = _16 * 8; > _21 = _15 + 4; > _22 = &a + _21; > ivtmp.9_14 = (unsigned long) _22; > > <bb 5>: > # i_11 = PHI <k_4(D)(4), i_8(5)> > # ivtmp.9_13 = PHI <ivtmp.9_14(4), ivtmp.9_17(5)> > _1 = (int *) ivtmp.9_13; > MEM[base: _1, offset: 0B] = 100; > i_8 = k_4(D) + i_11; > ivtmp.9_17 = ivtmp.9_13 + _15; > if (i_8 <= 999) > goto <bb 5>; > else > goto <bb 6>; > > <bb 6>: > a_p = _1; > ... > > I prefer this output, since you loose the needless tailing address > calculation. I am not so sure the eventually generated assembly is > better in this case though. Ill add both as attachments. Yes, in a more complicated loop with more register pressure the new variant could be better. That said, it would be interesting to see what the testcase originally was added for and if we can massage it to test that again.