Hello! Attached patch fixes a typo in ix86_expand_set_or_movmem, where a wrong define was used in a condition. The patch also adds additional condition (as proposed by H.J.) as a correctness improvement (although patched gcc bootstraps and regression tests OK without).
I took the liberty to rename a couple of (long) constants and to fix some typos while there. 2013-11-11 Uros Bizjak <ubiz...@gmail.com> H.J. Lu <hongjiu...@intel.com> PR target/58853 * config/i386/x86-tune.def (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Rename from TARGET_MISALIGNED_MOVE_STRING_PROLOGUES. * config/i386/i386.h (TARGET_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Rename from TARGET_MISALIGNED_MOVE_STRING_PROLOGUES_EPILOGUES. Update for renamed X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES. * config/i386/i386.c (ix86_expand_set_or_movmem): Use TARGET_MISALIGNED_MOVE_STRING_PRO_EPILOGUES to calculate misaligned_prologue_used. Check that desired_aling <= epilogue_size_needed. testsuite/ChangeLog: 2013-11-11 Uros Bizjak <ubiz...@gmail.com> PR target/58853 * gcc.target/i386/pr58853.c: New test. Patch was approved by Honza in the PR. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros.
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 204676) +++ config/i386/i386.c (working copy) @@ -23761,13 +23761,15 @@ ix86_expand_set_or_movmem (rtx dst, rtx src, rtx c } gcc_assert (desired_align >= 1 && align >= 1); - /* Misaligned move sequences handles both prologues and epilogues at once. - Default code generation results in smaller code for large alignments and - also avoids redundant job when sizes are known precisely. */ - misaligned_prologue_used = (TARGET_MISALIGNED_MOVE_STRING_PROLOGUES - && MAX (desired_align, epilogue_size_needed) <= 32 - && ((desired_align > align && !align_bytes) - || (!count && epilogue_size_needed > 1))); + /* Misaligned move sequences handle both prologue and epilogue at once. + Default code generation results in a smaller code for large alignments + and also avoids redundant job when sizes are known precisely. */ + misaligned_prologue_used + = (TARGET_MISALIGNED_MOVE_STRING_PRO_EPILOGUES + && MAX (desired_align, epilogue_size_needed) <= 32 + && desired_align <= epilogue_size_needed + && ((desired_align > align && !align_bytes) + || (!count && epilogue_size_needed > 1))); /* Do the cheap promotion to allow better CSE across the main loop and epilogue (ie one load of the big constant in the Index: config/i386/i386.h =================================================================== --- config/i386/i386.h (revision 204676) +++ config/i386/i386.h (working copy) @@ -353,8 +353,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_L #define TARGET_PROMOTE_QImode ix86_tune_features[X86_TUNE_PROMOTE_QIMODE] #define TARGET_FAST_PREFIX ix86_tune_features[X86_TUNE_FAST_PREFIX] #define TARGET_SINGLE_STRINGOP ix86_tune_features[X86_TUNE_SINGLE_STRINGOP] -#define TARGET_MISALIGNED_MOVE_STRING_PROLOGUES_EPILOGUES \ - ix86_tune_features[TARGET_MISALIGNED_MOVE_STRING_PROLOGUES] +#define TARGET_MISALIGNED_MOVE_STRING_PRO_EPILOGUES \ + ix86_tune_features[X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES] #define TARGET_QIMODE_MATH ix86_tune_features[X86_TUNE_QIMODE_MATH] #define TARGET_HIMODE_MATH ix86_tune_features[X86_TUNE_HIMODE_MATH] #define TARGET_PROMOTE_QI_REGS ix86_tune_features[X86_TUNE_PROMOTE_QI_REGS] Index: config/i386/x86-tune.def =================================================================== --- config/i386/x86-tune.def (revision 204676) +++ config/i386/x86-tune.def (working copy) @@ -257,13 +257,13 @@ DEF_TUNE (X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE, "avoi as MOVS and STOS (without a REP prefix) to move/set sequences of bytes. */ DEF_TUNE (X86_TUNE_SINGLE_STRINGOP, "single_stringop", m_386 | m_P4_NOCONA) -/* TARGET_MISALIGNED_MOVE_STRING_PROLOGUES: Enable generation of compace - prologues and epilogues by issuing a misaligned moves. This require - target to handle misaligned moves and partial memory stalls resonably - well. - FIXME: This actualy may be a win on more targets than listed here. */ -DEF_TUNE (TARGET_MISALIGNED_MOVE_STRING_PROLOGUES, - "misaligned_move_string_prologues", +/* X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES: Enable generation of + compact prologues and epilogues by issuing a misaligned moves. This + requires target to handle misaligned moves and partial memory stalls + reasonably well. + FIXME: This may actualy be a win on more targets than listed here. */ +DEF_TUNE (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES, + "misaligned_move_string_pro_epilogues", m_386 | m_486 | m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC) /* X86_TUNE_USE_SAHF: Controls use of SAHF. */ Index: testsuite/gcc.target/i386/pr58853.c =================================================================== --- testsuite/gcc.target/i386/pr58853.c (revision 0) +++ testsuite/gcc.target/i386/pr58853.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-minline-all-stringops" } */ +/* { dg-additional-options "-mtune=pentiumpro" { target { ia32 } } } */ + +void +my_memcpy (char *dest, const char *src, int n) +{ + __builtin_memcpy (dest, src, n); +}