> > > > The current x86 memset/memcpy expansion is broken. It miscompiles > > many programs, including GCC itself. Should it be reverted for now? > > There was problem in the new code doing loopy epilogues. > I am currently testing the following patch that shold fix the problem. > We could either revert now and I will apply combined patch or I hope to fix > that > tonight.
To expand little bit. I was looking into the code for most of the day today and the patch combines several fixes 1) the new loopy epilogue code was quite broken. It did not work for memset at all because the promoted value was not always initialized that I fixed in the version of patch that is in mainline now. It however also miss bound check in some cases. This is fixed by the expand_set_or_movmem_via_loop_with_iter change. 2) I misupdated atom description so 32bit memset was not expanded inline, this is fixed by memset changes 3) decide_alg was broken in two ways - first it gives complex algorithms for -O0 and it chose wrong variant when sse_loop is used. 4) the epilogue loop was output even in the case it is not needed - i.e. when unrolled loops handled 16 bytes at once, and block size is 39. This is the ix86_movmem and ix86_setmem change 5) The implementation of ix86_movmem/ix86_setmem diverged for no reason so I got it back to sync. For some reason SSE code in movmem is not output for 64bit unaligned memcpy that is fixed too. 6) it seems that both bdver and core is good enough on handling misaligned blocks that the alignmnet prologues can be ommited. This greatly improves and reduces size of the inline sequence. I will however break this out into independent patch. Life would be easier if the changes was made in multiple incremental steps, stringops expansion is relatively tricky busyness and realively easy to get wrong in some cases since there are so many of them depending on knowledge of size/alignmnet and target architecture. Honza