On Thu, 11 May 2017, Uros Bizjak wrote: > On Thu, May 11, 2017 at 2:48 PM, Richard Biener <rguent...@suse.de> wrote: > > On Thu, 11 May 2017, Rainer Orth wrote: > > > >> Hi Richard, > >> > >> > On Mon, 24 Apr 2017, Richard Biener wrote: > >> >> > >> >> One issue in PR79201 is that we don't sink pure/const calls which is > >> >> what the following simple patch fixes. > >> >> > >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu. > >> > > >> > Needed some gimple_assign_lhs -> gimple_get_lhs adjustments and > >> > adjustment of gcc.target/i386/pr22152.c where we now sink the > >> > assignment out of the pointless loop. Not sure what the original > >> > bug was about (well, reg allocation) so I simply disabled sinking > >> > for it. > >> > > >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > >> > > >> > Richard. > >> > > >> > 2017-04-25 Richard Biener <rguent...@suse.de> > >> > > >> > PR tree-optimization/79201 > >> > * tree-ssa-sink.c (statement_sink_location): Handle calls. > >> > > >> > * gcc.dg/tree-ssa/ssa-sink-16.c: New testcase. > >> > * gcc.target/i386/pr22152.c: Disable sinking. > >> > >> however, gcc.target/i386/pr22152.c FAILs now for 32-bit: > >> > >> FAIL: gcc.target/i386/pr22152.c scan-assembler-times movq[ > >> \\\\t]+[^\\n]*%mm 1 > > > > I remember seeing this and was not able to make sense of the testcase > > which was added to fix some backend issue. Disabling sinking doesn't > > work (IIRC) as it is required to generate the original code as well. > > > > Uros added the testcase in 2008 -- I think if we want to have a testcase > > for the original issue we need a different one. Or simply remove > > the testcase. > > No, there is something going on in the testcase: > > .L3: > movq (%ecx,%eax,8), %mm1 > paddq (%ebx,%eax,8), %mm1 > addl $1, %eax > movq %mm1, %mm0 > cmpl %eax, %edx > jne .L3 > > > The compiler should allocate %mm0 to movq and paddq to avoid %mm1 -> > %mm0 move. These are all movv1di patterns (they shouldn't interfere > with movdi), and it is not clear to me why RA allocates %mm1 instead > of %mm0.
In any case the testcase is no longer testing what it tested as the input to RA is now different. The testcase doesn't make much sense: __m64 unsigned_add3 (const __m64 * a, const __m64 * b, unsigned int count) { __m64 sum; unsigned int i; for (i = 1; i < count; i++) sum = _mm_add_si64 (a[i], b[i]); return sum; } that's equivalent to __m64 unsigned_add3 (const __m64 * a, const __m64 * b, unsigned int count) { __m64 sum; unsigned int i; if (1 < count) sum = _mm_add_si64 (a[count-1], b[count-1]); return sum; } which means possibly using uninitialized sum plus a pointless loop. Richard.