Should -msse3 enable fisttp
Hi, I know this has been discussed in bug 18668. But I'd like to bring it up again. Currently, fisttp is only generated with -march=prescott. The argument is fisttp is not a SSE instruction. While this is technically true, it's likely to surprise the users. Intel, after all, does lump fisttp in SSE3. AMD64 architecture programmer's manual also explicitly state fisttp is a SSE3 instruction. Please consider enabling fisttp with -msse3. Thanks, Evan
Re: Should -msse3 enable fisttp?
Hi Uros, Since you are the one who enabled fisttp, I figure I should send this email to you directly. Let me know what you think. I kind of agree with your argument. But for practical reasons I thinkg -msse3 should enable fisttp. Certainly here in Apple, a few folks have been surprised by this. Thanks, Evan Senior Compiler Engineer Apple Computers On Sep 29, 2005, at 1:48 PM, Evan Cheng wrote: Hi, I know this has been discussed in bug 18668. But I'd like to bring it up again. Currently, fisttp is only generated with -march=prescott. The argument is fisttp is not a SSE instruction. While this is technically true, it's likely to surprise the users. Intel, after all, does lump fisttp in SSE3. AMD64 architecture programmer's manual also explicitly state fisttp is a SSE3 instruction. Please consider enabling fisttp with -msse3. Thanks, Evan
Re: Should -msse3 enable fisttp
My mistake. I misunderstood the meaning of -msse3 (it only enables the sse3 builtins). Please ignore. On Sep 29, 2005, at 1:48 PM, Evan Cheng wrote: Hi, I know this has been discussed in bug 18668. But I'd like to bring it up again. Currently, fisttp is only generated with -march=prescott. The argument is fisttp is not a SSE instruction. While this is technically true, it's likely to surprise the users. Intel, after all, does lump fisttp in SSE3. AMD64 architecture programmer's manual also explicitly state fisttp is a SSE3 instruction. Please consider enabling fisttp with -msse3. Thanks, Evan Evan Cheng Senior Compiler Engineer Apple Computer, Inc.
Re: Should -msse3 enable fisttp
Well, both Intel and AMD calls fisttp a SSE3 instruction even though it operates on the x87 stack ST(0). My argument is users who specify - msse3 to turn on SSE3 instructions would expect fisttp be turned on as well. But according to the manual -msse3 does not turn on generation of SSE3 instructions: -mmmx -mno-mmx -msse -mno-sse -msse2 -mno-sse2 -msse3 -mno-sse3 -m3dnow -mno-3dnow These switches enable or disable the use of built-in functions that allow direct access to the MMX, SSE, SSE2, SSE3 and 3Dnow extensions of the instruction set. See X86 Built-in Functions, for details of the functions enabled and disabled by these switches. To have SSE/SSE2 instructions generated automatically from floating- point code, see -mfpmath=sse. Thus the confusion. Evan On Oct 3, 2005, at 3:25 PM, Andrew Pinski wrote: On Oct 3, 2005, at 5:56 PM, Evan Cheng wrote: My mistake. I misunderstood the meaning of -msse3 (it only enables the sse3 builtins). Please ignore. Actually it enables more than the builtins. It enables the use sse3 instructions. This is just like -maltivec on PowerPC and -msse and -msse on x86, etc. Hmm, but from the original patch: http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01119.html "BTW: Regarding TARGET_FISTTP macro: according to documentation, fisttp insn indeed depends on (TARGET_80387 && TARGET_SSE3). However, this insn is not a SSE3 instruction, so it should not be disabled by -mno-sse3 flag." And then RTH agreed: http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01432.html So from the sound of it fisttp is not a SSE3 instruction. Thanks, Andrew Pinski
Need advice: x86 redudant compare to zero
Hi, Here is a bits of code from bzip2: #define mswap(zz1, zz2) { int zztmp = zz1; zz1 = zz2; zz2 = zztmp; } void foo(int unLo, int unHi, int ltLo, int *ptr, char *block, int med, int d, int n) { while (1) { if (unLo > unHi) break; n = ((int)block[ptr[unLo]+d]) - med; if (n == 0) { mswap(ptr[unLo], ptr[ltLo]); ltLo++; unLo++; continue; }; if (n > 0) break; unLo++; } } gcc produces the following code: .text .globl _foo _foo: pushl %ebp movl%esp, %ebp pushl %edi pushl %esi subl$12, %esp movl8(%ebp), %edx cmpl12(%ebp), %edx jg L10 movl16(%ebp), %eax movl20(%ebp), %ecx leal(%ecx,%eax,4), %eax movl%eax, -16(%ebp) jmp L4 L12: movl-16(%ebp), %ecx movl(%ecx), %eax movl%eax, (%esi) movl-20(%ebp), %edi movl%edi, (%ecx) addl$4, %ecx movl%ecx, -16(%ebp) addl$1, %edx cmpl%edx, 12(%ebp) jl L10 L13: movl20(%ebp), %ecx L4: leal(%ecx,%edx,4), %esi movl(%esi), %edi movl%edi, -20(%ebp) movl24(%ebp), %eax addl%edi, %eax movl32(%ebp), %edi movsbl (%eax,%edi),%eax subl28(%ebp), %eax cmpl$0, %eax < extra compare... je L12 jg L10 addl$1, %edx cmpl%edx, 12(%ebp) jge L13 L10: addl$12, %esp popl%esi popl%edi popl%ebp ret The cmpl is not needed because subl has already set the flags. My question is: where and how would you suggest we do this optimization. With peephole2? Or in combine? In i386.md, I see pattern *subsi_2 looks like what I'd like to combine these two insn into: (define_insn "*subsi_2" [(set (reg FLAGS_REG) (compare (minus:SI (match_operand:SI 1 "nonimmediate_operand" "0,0") (match_operand:SI 2 "general_operand" "ri,rm")) (const_int 0))) (set (match_operand:SI 0 "nonimmediate_operand" "=rm,r") (minus:SI (match_dup 1) (match_dup 2)))] "ix86_match_ccmode (insn, CCGOCmode) && ix86_binary_operator_ok (MINUS, SImode, operands)" "sub{l}\t{%2, %0|%0, %2}" [(set_attr "type" "alu") (set_attr "mode" "SI")]) But I do not see a peephole2 that would generate this insn. Does anyone know how this pattern is used? Suggestions are appreciated! Thanks, Evan Cheng Apple Computers, Inc.
Re: New GCC releases comparison and comparison of GCC4.4 and LLVM2.5 on SPEC2000
On May 13, 2009, at 4:51 AM, Duncan Sands wrote: Hi, Sorry, I missed to mention that I used an additional option -mpc64 for 32-bit GCC4.4. It is not possible to generate SPECFP2000 expected results by GCC4.4 without this option. LLVM does not support this option. And this option can significantly improve the performance. So 32-bit comparison of SPECFP2000 should be taken with a grain of salt. what does -mpc64 do exactly? The gcc docs say: `-mpc64' rounds the the significands of results of floating-point operations to 53 bits (double precision) Does this mean that a rounding operation is performed after each fp operation, or that optimizations are permitted that don't result in accurate extended double precision values as long as they are correct to 53 bits, or something else? The LLVM code generators have an option called -limit-float-precision: -limit-float-precision= - Generate low-precision inline sequences for some float libcalls I'm not sure what it does exactly, but perhaps it is similar to - mpc64? No, that inline a small set of libcalls into sequences code that implement low precision math (6, 8, 12 bits). Evan Ciao, Duncan.