Hello! Attached patch fine-tunes the condition when prefetchw write prefetch insns are emitted. prefetchw is preferred for non-SSE2 K7 athlons (this is covered by i386-prefetch.exp tests), on the other hand, SSE prefetches are preferred for K8 targets, as measured and reported in PR 77270.
For newer targets, PRFCHW cpuid bit is respected, and -march=native correctly emits prefetchw, when PRFCHW cpuid bit is set. (on a related note, PTA_PRFCHW should probably be set for amdfam10+ targets, Venkataramanan is looking into this issue). 2016-08-21 Uros Bizjak <ubiz...@gmail.com> PR target/77270 * config/i386/i386.md (prefetch): When TARGET_PRFCHW or TARGET_PREFETCHWT1 are disabled, emit 3dNOW! write prefetches for non-SSE2 athlons only, otherwise prefer SSE prefetches. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. Uros.
Index: i386.md =================================================================== --- i386.md (revision 239642) +++ i386.md (working copy) @@ -18634,20 +18634,24 @@ gcc_assert (IN_RANGE (locality, 0, 3)); /* Use 3dNOW prefetch in case we are asking for write prefetch not - supported by SSE counterpart or the SSE prefetch is not available - (K6 machines). Otherwise use SSE prefetch as it allows specifying - of locality. */ + supported by SSE counterpart (non-SSE2 athlon machines) or the + SSE prefetch is not available (K6 machines). Otherwise use SSE + prefetch as it allows specifying of locality. */ if (write) { if (TARGET_PREFETCHWT1) operands[2] = GEN_INT (MAX (locality, 2)); - else if (TARGET_3DNOW || TARGET_PRFCHW) + else if (TARGET_PRFCHW) operands[2] = GEN_INT (3); + else if (TARGET_3DNOW && !TARGET_SSE2) + operands[2] = GEN_INT (3); + else if (TARGET_PREFETCH_SSE) + operands[1] = const0_rtx; else { - gcc_assert (TARGET_PREFETCH_SSE); - operands[1] = const0_rtx; + gcc_assert (TARGET_3DNOW); + operands[2] = GEN_INT (3); } } else