Should -msse3 enable fisttp

2005-09-29 Thread Evan Cheng

Hi,

I know this has been discussed in bug 18668. But I'd like to bring it  
up again.


Currently, fisttp is only generated with -march=prescott. The  
argument is fisttp is not a SSE instruction. While this is  
technically true, it's likely to surprise the users. Intel, after  
all, does lump fisttp in SSE3.  AMD64 architecture programmer's  
manual also explicitly state fisttp is a SSE3 instruction.


Please consider enabling fisttp with -msse3.

Thanks,

Evan


Re: Should -msse3 enable fisttp?

2005-10-03 Thread Evan Cheng

Hi Uros,

Since you are the one who enabled fisttp, I figure I should send this  
email to you directly.


Let me know what you think. I kind of agree with your argument. But  
for practical reasons I thinkg -msse3 should enable fisttp. Certainly  
here in Apple, a few folks have been surprised by this.


Thanks,

Evan
Senior Compiler Engineer
Apple Computers

On Sep 29, 2005, at 1:48 PM, Evan Cheng wrote:


Hi,

I know this has been discussed in bug 18668. But I'd like to bring  
it up again.


Currently, fisttp is only generated with -march=prescott. The  
argument is fisttp is not a SSE instruction. While this is  
technically true, it's likely to surprise the users. Intel, after  
all, does lump fisttp in SSE3.  AMD64 architecture programmer's  
manual also explicitly state fisttp is a SSE3 instruction.


Please consider enabling fisttp with -msse3.

Thanks,

Evan





Re: Should -msse3 enable fisttp

2005-10-03 Thread Evan Cheng
My mistake. I misunderstood the meaning of -msse3 (it only enables  
the sse3 builtins). Please ignore.


On Sep 29, 2005, at 1:48 PM, Evan Cheng wrote:


Hi,

I know this has been discussed in bug 18668. But I'd like to bring  
it up again.


Currently, fisttp is only generated with -march=prescott. The  
argument is fisttp is not a SSE instruction. While this is  
technically true, it's likely to surprise the users. Intel, after  
all, does lump fisttp in SSE3.  AMD64 architecture programmer's  
manual also explicitly state fisttp is a SSE3 instruction.


Please consider enabling fisttp with -msse3.

Thanks,

Evan



Evan Cheng
Senior Compiler Engineer
Apple Computer, Inc.






Re: Should -msse3 enable fisttp

2005-10-03 Thread Evan Cheng
Well, both Intel and AMD calls fisttp a SSE3 instruction even though  
it operates on the x87 stack ST(0). My argument is users who specify - 
msse3 to turn on SSE3 instructions would expect fisttp be turned on  
as well.


But according to the manual -msse3 does not turn on generation of  
SSE3 instructions:



-mmmx
-mno-mmx
-msse
-mno-sse
-msse2
-mno-sse2
-msse3
-mno-sse3
-m3dnow
-mno-3dnow
These switches enable or disable the use of built-in functions that  
allow direct access to the MMX, SSE, SSE2, SSE3 and 3Dnow  
extensions of the instruction set.
See X86 Built-in Functions, for details of the functions enabled  
and disabled by these switches.
To have SSE/SSE2 instructions generated automatically from floating- 
point code, see -mfpmath=sse.


Thus the confusion.

Evan

On Oct 3, 2005, at 3:25 PM, Andrew Pinski wrote:



On Oct 3, 2005, at 5:56 PM, Evan Cheng wrote:


My mistake. I misunderstood the meaning of -msse3 (it only enables  
the sse3 builtins). Please ignore.





Actually it enables more than the builtins.  It enables the use sse3
instructions.  This is just like -maltivec on PowerPC and -msse and  
-msse

on x86, etc.

Hmm, but from the original patch:
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01119.html
"BTW: Regarding TARGET_FISTTP macro: according to documentation,  
fisttp insn indeed depends on (TARGET_80387 && TARGET_SSE3).  
However, this insn is not a SSE3 instruction, so it should not be  
disabled by -mno-sse3 flag."


And then RTH agreed:
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01432.html

So from the sound of it fisttp is not a SSE3 instruction.

Thanks,
Andrew Pinski







Need advice: x86 redudant compare to zero

2005-10-13 Thread Evan Cheng

Hi,

Here is a bits of code from bzip2:

#define mswap(zz1, zz2) { int zztmp = zz1; zz1 = zz2; zz2 =  
zztmp; }


void foo(int unLo, int unHi, int ltLo, int *ptr, char *block, int  
med, int d, int n) {


  while (1) {
if (unLo > unHi) break;
n = ((int)block[ptr[unLo]+d]) - med;
if (n == 0) {
  mswap(ptr[unLo], ptr[ltLo]);
  ltLo++; unLo++; continue;
};
if (n >  0) break;
unLo++;
  }
}

gcc produces the following code:

.text
.globl _foo
_foo:
pushl   %ebp
movl%esp, %ebp
pushl   %edi
pushl   %esi
subl$12, %esp
movl8(%ebp), %edx
cmpl12(%ebp), %edx
jg  L10
movl16(%ebp), %eax
movl20(%ebp), %ecx
leal(%ecx,%eax,4), %eax
movl%eax, -16(%ebp)
jmp L4
L12:
movl-16(%ebp), %ecx
movl(%ecx), %eax
movl%eax, (%esi)
movl-20(%ebp), %edi
movl%edi, (%ecx)
addl$4, %ecx
movl%ecx, -16(%ebp)
addl$1, %edx
cmpl%edx, 12(%ebp)
jl  L10
L13:
movl20(%ebp), %ecx
L4:
leal(%ecx,%edx,4), %esi
movl(%esi), %edi
movl%edi, -20(%ebp)
movl24(%ebp), %eax
addl%edi, %eax
movl32(%ebp), %edi
movsbl  (%eax,%edi),%eax
subl28(%ebp), %eax
cmpl$0, %eax   < extra compare...
je  L12
jg  L10
addl$1, %edx
cmpl%edx, 12(%ebp)
jge L13
L10:
addl$12, %esp
popl%esi
popl%edi
popl%ebp
ret

The cmpl is not needed because subl has already set the flags.

My question is: where and how would you suggest we do this  
optimization. With peephole2? Or in combine? In i386.md, I see  
pattern *subsi_2 looks like what I'd like to combine these two insn  
into:


(define_insn "*subsi_2"
  [(set (reg FLAGS_REG)
(compare
  (minus:SI (match_operand:SI 1 "nonimmediate_operand" "0,0")
(match_operand:SI 2 "general_operand" "ri,rm"))
  (const_int 0)))
   (set (match_operand:SI 0 "nonimmediate_operand" "=rm,r")
(minus:SI (match_dup 1) (match_dup 2)))]
  "ix86_match_ccmode (insn, CCGOCmode)
   && ix86_binary_operator_ok (MINUS, SImode, operands)"
  "sub{l}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "mode" "SI")])

But I do not see a peephole2 that would generate this insn. Does  
anyone know how this pattern is used?


Suggestions are appreciated!

Thanks,

Evan Cheng
Apple Computers, Inc.



Re: New GCC releases comparison and comparison of GCC4.4 and LLVM2.5 on SPEC2000

2009-05-13 Thread Evan Cheng


On May 13, 2009, at 4:51 AM, Duncan Sands wrote:


Hi,

Sorry, I missed to mention that I used an additional option -mpc64  
for

32-bit GCC4.4.  It is not possible to generate SPECFP2000 expected
results  by GCC4.4 without this option. LLVM does not support this
option.  And this option can significantly improve the  
performance.  So

32-bit comparison of SPECFP2000 should be taken with a grain of salt.


what does -mpc64 do exactly?  The gcc docs say:
 `-mpc64' rounds the the significands of results of floating-point  
operations to 53 bits (double precision)

Does this mean that a rounding operation is performed after each fp
operation, or that optimizations are permitted that don't result in
accurate extended double precision values as long as they are correct
to 53 bits, or something else?

The LLVM code generators have an option called -limit-float-precision:
 -limit-float-precision=   - Generate low-precision inline  
sequences for some float libcalls
I'm not sure what it does exactly, but perhaps it is similar to - 
mpc64?


No, that inline a small set of libcalls into sequences code that  
implement low precision math (6, 8, 12 bits).


Evan



Ciao,

Duncan.