http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50955
bin.cheng <amker.cheng at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amker.cheng at gmail dot com --- Comment #17 from bin.cheng <amker.cheng at gmail dot com> --- Hi Richard, I am having difficulty in understanding cases if this PR. For the reported case with two loops: for( y=0; y<4; y++, pDst += dstStep ) { for( x=y+1; x<4; x++ ) { s = ( p1[x-y-1] + p1[x-y] + p1[x-y] + p1[x-y+1] + 2 ) >> 2; pDst[x] = (unsigned char)s; } pDst[y] = p3; } The dump for statement 'pDst[y] = p3;' before IVOPT is like: <bb 4>: Invalid sum of incoming frequencies 1667, should be 278 y.2_64 = (sizetype) y_89; D.6421_65 = pDst_88 + y.2_64; *D.6421_65 = p3_37; pDst_69 = pDst_88 + pretmp.21_118; ivtmp.35_116 = ivtmp.35_87 - 1; if (ivtmp.35_116 != 0) goto <bb 18>; else goto <bb 19>; IVOPT chooses candidate 15: candidate 15 depends on 3 var_before ivtmp.154 var_after ivtmp.154 incremented before exit test type unsigned int base (unsigned int) pDst_39(D) - (unsigned int) &p1 step (unsigned int) (pretmp.21_118 + 1) for use 1: use 1 address in statement *D.6421_65 = p3_37; at position *D.6421_65 type unsigned char * base pDst_39(D) step pretmp.21_118 + 1 base object (void *) pDst_39(D) related candidates After rewriting, the dump is like: <bb 4>: Invalid sum of incoming frequencies 1667, should be 278 MEM[symbol: p1, index: ivtmp.154_200, offset: 0B] = p3_37; pDst_69 = pDst_88 + pretmp.21_118; ivtmp.149_218 = ivtmp.149_249 - 1; ivtmp.154_190 = ivtmp.154_200 + D.6617_250; if (x_40 != 4) goto <bb 18>; else goto <bb 19>; Eventually, the storing to TMR[p1,ivtmp,0] is considered local and deleted. BUT, for your reduced case: p3 = (unsigned char)(((signed int)p1[1] + (signed int)p2[1] + (signed int)p1[0] +(signed int)p1[0] + 2 ) >> 2 ); for( x=y+1; x<4; x++ ) { s = ( p1[x-y-1] + p1[x-y] + p1[x-y] + p1[x-y+1] + 2 ) >> 2; pDst[x] = (unsigned char)s; } pDst[y] = p3; It is about the the TMR in below dump (before IVOPT): <bb 6>: # vect_pp1.30_166 = PHI <vect_pp1.30_167(16), vect_pp1.33_165(5)> # vect_pp1.37_176 = PHI <vect_pp1.37_177(16), vect_pp1.40_175(5)> # vect_pp1.46_194 = PHI <vect_pp1.46_195(16), vect_pp1.49_193(5)> # vect_p.60_223 = PHI <vect_p.60_224(16), vect_p.63_222(5)> # ivtmp.64_225 = PHI <ivtmp.64_226(16), 0(5)> ... MEM[(unsigned char *)vect_p.60_223] = vect_var_.58_219; vect_pp1.30_167 = vect_pp1.30_166 + 8; vect_pp1.37_177 = vect_pp1.37_176 + 8; vect_pp1.46_195 = vect_pp1.46_194 + 8; vect_p.60_224 = vect_p.60_223 + 8; ivtmp.64_226 = ivtmp.64_225 + 1; if (ivtmp.64_226 < bnd.27_128) goto <bb 16>; else goto <bb 7>; Your patch prevents IVOPT from choosing cand 4: candidate 4 (important) var_before ivtmp.110 var_after ivtmp.110 incremented before exit test type unsigned int base (unsigned int) (&p1 + 8) step 8 base object (void *) &p1 for use 3: use 3 generic in statement vect_p.60_223 = PHI <vect_p.60_224(16), vect_p.63_222(5)> at position type vector(8) unsigned char * base batmp.61_221 + 1 step 8 base object (void *) batmp.61_221 is a biv related candidates To prevent IVOPT from rewriting into: <bb 6>: # ivtmp.107_150 = PHI <ivtmp.107_256(16), 0(5)> # ivtmp.110_241 = PHI <ivtmp.110_146(16), ivtmp.110_132(5)> D.6585_133 = (unsigned int) batmp.61_221; p1.131_277 = (unsigned int) &p1; D.6587_278 = D.6585_133 - p1.131_277; D.6588_279 = D.6587_278 + ivtmp.110_241; D.6589_280 = D.6588_279 + 4294967289; D.6590_281 = (vector(8) unsigned char *) D.6589_280; vect_p.60_223 = D.6590_281; ... MEM[(unsigned char *)vect_p.60_223] = vect_var_.58_219; ivtmp.107_256 = ivtmp.107_150 + 1; ivtmp.110_146 = ivtmp.110_241 + 8; if (ivtmp.107_256 < bnd.27_128) goto <bb 16>; else goto <bb 7>; Thus prevents IVOPT from generating candidate 15 in outer loop. (Expressing use 3 by cand 4 itself is good, right?) ------------------------------- But, It seems because the check: if (address_p) { /* Do not try to express address of an object with computation based on address of a different object. This may cause problems in rtl level alias analysis (that does not expect this to be happening, as this is illegal in C), and would be unlikely to be useful anyway. */ if (use->iv->base_object && cand->iv->base_object && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0)) return infinite_cost; failed because cand(15)->iv->base_object == NULL. For the reported case, it's not about an iv use appearing in memory reference while not marked as address_p, and can be fixed by revise the existing check condition, is it true? PS, sorry for replying to a fixed PR, I found it's kind of impossible to fix PR52272 without fully understanding this one.