On Tue, May 3, 2016 at 12:40 AM, Kumar, Venkataramanan <venkataramanan.ku...@amd.com> wrote: > Hi > >> -----Original Message----- >> From: NightStrike [mailto:nightstr...@gmail.com] >> Sent: Monday, May 2, 2016 10:31 PM >> To: Kumar, Venkataramanan <venkataramanan.ku...@amd.com> >> Cc: Uros Bizjak (ubiz...@gmail.com) <ubiz...@gmail.com>; >> lopeziba...@gmail.com; Jan Hubicka <hubi...@ucw.cz>; Jakub Jelinek >> <ja...@redhat.com>; gcc@gcc.gnu.org >> Subject: Re: option -mprfchw on 2 different Opteron cpus >> >> On Mon, May 2, 2016 at 5:55 AM, Kumar, Venkataramanan >> <venkataramanan.ku...@amd.com> wrote: >> >> If I compile on a k8 Opteron 248 with -march=native, I do not see >> >> -mprfchw listed in the options in -fverbose-asm. In the assembly, I see >> this: >> >> >> >> prefetcht0 (%rax) # ivtmp.1160 >> >> prefetcht0 304(%rcx) # >> >> prefetcht0 (%rax) # ivtmp.1160 >> > >> > In AMD processors -mprfchw flag is used to enable "3dnowprefetch" ISA >> support. >> > >> > (Snip) >> > CPUID Fn8000_0001_ECX Feature Identifiers Bit 8 >> > 3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See >> > “PREFETCH” and “PREFETCHW” in APM3 >> > Ref: http://support.amd.com/TechDocs/25481.pdf >> > (Snip) >> > >> > Can you please confirm what this CPUID flag returns on your k8 machine ?. >> > I believe this ISA is not available on k8 machine so when -march=native is >> added you don’t see -mprfchw in verbose. >> >> Looks like zero? This was generated with the cpuid program from >> http://www.etallen.com/cpuid.html >> >> 3DNow! instruction extensions = true >> 3DNow! instructions = true > > It has 3Dnow support. "prefetchw" is available with 3dnow. > >> misaligned SSE mode = false >> 3DNow! PREFETCH/PREFETCHW instructions = false > > It does not have 3DNowprefetch enabling ISA flag -mprftchw is not correct for > -march=k8. > >> OS visible workaround = false >> instruction based sampling = false >> >> If I compile on a bdver2 Opteron 6386 SE with -march=k8 (thus trying >> >> to target the older system), I do see it listed in the options in >> >> -fverbose-asm. In the assembly, I see this: >> > >> > K8 has 3dnow support and there is a patch that replaced 3dnow with >> prefetchw (3DNowPrefetch). >> > https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00866.html >> > So when you add -march=k8 you see -mprfchw getting listed in verbose. >> > >> >> >> >> prefetcht0 (%rax) # ivtmp.1160 >> >> prefetcht0 304(%rcx) # >> >> prefetchw (%rax) # ivtmp.1160 >> >> >> >> (The third line is the only difference) >> >> >> > >> > This is my guess without seeing the test case, when write prefetching is >> requested "prefetchw" is generated. >> > 3dnow (TARGET_3DNOW) ISA has support for it. >> > >> > (Snip) >> > Support for the PREFETCH and PREFETCHW instructions is indicated by >> > CPUID Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR >> > Fn8000_0001_EDX[3DNow] = 1. >> > (Snip) >> > Ref: >> http://developer.amd.com/wordpress/media/2008/10/24594_APM_v3.pdf >> > >> >> In both cases, I'm using gcc 4.9.3. Which is correct for a k8 Opteron >> >> 248? >> >> >> >> Also, FWIW: >> >> >> >> 1) The march=native version that uses prefetcht0 is very repeatably >> >> faster by about 15% in the particular test case I'm looking at. >> >> >> >> 2) The compilers in both instances are not just the same version, >> >> they are the same compiler binary installed on an NFS mount and >> >> shared to both computers. >> > >> > As per GCC4.9.3 source. >> > >> > (Snip) >> > (define_expand "prefetch" >> > [(prefetch (match_operand 0 "address_operand") >> > (match_operand:SI 1 "const_int_operand") >> > (match_operand:SI 2 "const_int_operand"))] >> > "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1" >> > { >> > bool write = INTVAL (operands[1]) != 0; >> > int locality = INTVAL (operands[2]); >> > >> > gcc_assert (IN_RANGE (locality, 0, 3)); >> > >> > /* Use 3dNOW prefetch in case we are asking for write prefetch not >> > supported by SSE counterpart or the SSE prefetch is not available >> > (K6 machines). Otherwise use SSE prefetch as it allows specifying >> > of locality. */ >> > if (TARGET_PREFETCHWT1 && write && locality <= 2) >> > operands[2] = const2_rtx; >> > else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE)) >> > operands[2] = GEN_INT (3); >> > else >> > operands[1] = const0_rtx; >> > }) >> > (Snip) >> > >> > Write prefetch may be requested (either by auto prefetcher or builtins) but >> on -march=native, the below check could have become false. >> > else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE)) >> > TARGET_PRFCHW is off on native. >> > >> > So there are two issues here. >> > >> > (1) ISA flags enabled with -march=k8 is different from -march=native on k8 >> machine. > > I think we need to file bug for this. Need to check with Uros why the flag > -mprfchw is shared with 3dnow. > To work around this issue you can use -mno-prfchw when building with > -march=k8.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77270 >> > (2) Need to check why GCC middle end requested write prefetch for the >> test case with -march=k8 . > On "prefetchw" generation it may be the case that GCC auto prefetcher > requests write prefetches. > AFAIK generating write prefetches brings data from memory and marks the catch > line modified and expects a write to happen next. > If read happens to that cache line instead then data will be written back to > memory before read which will be unnecessary. > Hard to answer without test case and I don’t have a ready k8 machine with me. Should this be another bug filed if I can get a reduced test case, or is PR77270 enough, or is this not a bug?