On Sat, Aug 09, 2014 at 08:08:21PM +0200, Branko Čibej wrote: > On 09.08.2014 15:55, Alan Modra wrote: > > Using current git sources compiled with gcc-4.9.1 or mainline gcc at > > -O3 results in failure of random-test. > > > > On powerpc64le we see > > PASS: random-test 1: random delta test > > svn_tests: E200006: mismatch at position xxxxx > > FAIL: random-test 2: random combine delta test > > > > x86_64 gives > > PASS: random-test 1: random delta test > > svn_tests: E200006: Test crashed (run in debugger with '--allow-segfaults') > > FAIL: random-test 2: random combine delta test > > > > After some digging, I narrowed the failure down to > > subversion/libsvn_delta/text_delta.c, and the specific -O3 options > > causing the failure to -ftree-loop-vectorize -fvect-cost-model=dynamic. > > At first I thought I'd found a vectorizer bug, but the real problem is > > a bug in the source, specifically this line in patterning_copy: > > > > *(apr_uint32_t *)(target) = *(apr_uint32_t *)(source); > > > > Quoting from ISO/IEC 9899:1999 > > 6.3.2.3 Pointers > > ... > > 7 A pointer to an object or incomplete type may be converted to a > > pointer to a different object or incomplete type. If the resulting > > pointer is not correctly aligned for the pointed-to type, the behavior > > is undefined. > > > > So here we have undefined behaviour if "source" and "target" are not > > 4-byte aligned.. Fixed as follows. > > Nope. This code won't fly because it's not portable across compilers and > platforms.
In that case, consider my email a bug report. I've done the hard work debugging and analyzing the problem. Fixing it is trivial, for instance by using typedef struct _unaligned { char u[4]; } unaligned32; in the patch I posted rather than typedef apr_uint32_t __attribute__ ((aligned (1))) unaligned32; > The way to fix this is to make sure that the macro > SVN_UNALIGNED_ACCESS_IS_OK gives the correct answer; and it's OK if > that answer compiler-specific, not just platform-specific. In other > words, it's fine if the macro gives a different answer depending on > GCC vectorizer options. If it were up to me I'd revert r956593 entirely. Thinking you can optimise memcpy is just plain wrong-headed, and this is the second bug found in patterning_copy(). Hmm, actually if you want to optimise patterning_copy() then something that might make sense is to detect the overlap case and copy the original repeating sequence multiple times, rather than copying what has just been written. That is likely to give better cache performance especially on processors that suffer from load-hit-store stalls. -- Alan Modra Australia Development Lab, IBM