Re: Bug in floating point multiplication

Jason Swails Fri, 03 Jul 2015 18:15:07 -0700

On Fri, Jul 3, 2015 at 11:13 AM, Oscar Benjamin <oscar.j.benja...@gmail.com>
wrote:


> On 2 July 2015 at 18:29, Jason Swails <jason.swa...@gmail.com> wrote:
> >
> > As others have suggested, this is almost certainly a 32-bit vs. 64-bit
> > issue.  Consider the following C program:
> >
> > // maths.h
> > #include <math.h>
> > #include <stdio.h>
> >
> > int main() {
> >     double x;
> >     int i;
> >     x = 1-pow(0.5, 53);
> >
> >     for (i = 1; i < 1000000; i++) {
> >         if ((int)(i*x) == i) {
> >             printf("%d\n", i);
> >             break;
> >         }
> >     }
> >
> >     return 0;
> > }
> >
> > For the most part, this should be as close to an exact transliteration of
> > your Python code as possible.
> >
> > Here's what I get when I try compiling and running it on my 64-bit
> (Gentoo)
> > Linux machine with 32-bit compatible libs:
> >
> > swails@batman ~/test $ gcc maths.c
> > swails@batman ~/test $ ./a.out
> > swails@batman ~/test $ gcc -m32 maths.c
> > swails@batman ~/test $ ./a.out
> > 2049
>
> I was unable to reproduce this on my system. In both cases the loops
> run to completion. A look at the assembly generated by gcc shows that
> something different goes on there though.
>
> The loop in the 64 bit one (in the main function) looks like:
>
> $ objdump -d a.out | less
> ...
> 400555:  pxor   %xmm0,%xmm0
> 400559:  cvtsi2sdl -0xc(%rbp),%xmm0
> 40055e:  mulsd  -0x8(%rbp),%xmm0
> 400563:  cvttsd2si %xmm0,%eax
> 400567:  cmp    -0xc(%rbp),%eax
> 40056a:  jne    400582 <main+0x4c>
> 40056c:  mov    -0xc(%rbp),%eax
> 40056f:  mov    %eax,%esi
> 400571:  mov    $0x400624,%edi
> 400576:  mov    $0x0,%eax
> 40057b:  callq  400410 <printf@plt>
> 400580:  jmp    40058f <main+0x59>
> 400582:  addl   $0x1,-0xc(%rbp)
> 400586:  cmpl   $0xf423f,-0xc(%rbp)
> 40058d:  jle    400555 <main+0x1f>
> ...
>
> Where is the 32 bit one looks like:
>
> $ objdump -d a.out.32 | less
> ...
>  804843e:  fildl  -0x14(%ebp)
>  8048441:  fmull  -0x10(%ebp)
>  8048444:  fnstcw -0x1a(%ebp)
>  8048447:  movzwl -0x1a(%ebp),%eax
>  804844b:  mov    $0xc,%ah
>  804844d:  mov    %ax,-0x1c(%ebp)
>  8048451:  fldcw  -0x1c(%ebp)
>  8048454:  fistpl -0x20(%ebp)
>  8048457:  fldcw  -0x1a(%ebp)
>  804845a:  mov    -0x20(%ebp),%eax
>  804845d:  cmp    -0x14(%ebp),%eax
>  8048460:  jne    8048477 <main+0x5c>
>  8048462:  sub    $0x8,%esp
>  8048465:  pushl  -0x14(%ebp)
>  8048468:  push   $0x8048520
>  804846d:  call   80482f0 <printf@plt>
>  8048472:  add    $0x10,%esp
>  8048475:  jmp    8048484 <main+0x69>
>  8048477:  addl   $0x1,-0x14(%ebp)
>  804847b:  cmpl   $0xf423f,-0x14(%ebp)
>  8048482:  jle    804843e <main+0x23>
> ...
>
> So the 64 bit one is using SSE instructions and the 32-bit one is
> using x87. That could explain the difference you see at the C level
> but I don't see it on this CPU (/proc/cpuinfo says Intel(R) Core(TM)
> i5-3427U CPU @ 1.80GHz).
>

Hmm.  Well that could explain why you don't get the same results as me.
My CPU is a
AMD FX(tm)-6100 Six-Core Processor
 (from /proc/cpuinfo).  My objdump looks the same as yours for the 64-bit
version, but for 32-bit it looks like:

...
 804843a:       db 44 24 14             fildl  0x14(%esp)


 804843e:       dc 4c 24 18             fmull  0x18(%esp)
 8048442:       dd 5c 24 08             fstpl  0x8(%esp)
 8048446:       f2 0f 2c 44 24 08       cvttsd2si 0x8(%esp),%eax
 804844c:       3b 44 24 14             cmp    0x14(%esp),%eax
 8048450:       75 16                   jne    8048468 <main+0x4b>
 8048452:       8b 44 24 14             mov    0x14(%esp),%eax
 8048456:       89 44 24 04             mov    %eax,0x4(%esp)
 804845a:       c7 04 24 10 85 04 08    movl   $0x8048510,(%esp)
 8048461:       e8 8a fe ff ff          call   80482f0 <printf@plt>
 8048466:       eb 0f                   jmp    8048477 <main+0x5a>
 8048468:       83 44 24 14 01          addl   $0x1,0x14(%esp)
 804846d:       81 7c 24 14 3f 42 0f    cmpl   $0xf423f,0x14(%esp)
 8048474:       00
 8048475:       7e c3                   jle    804843a <main+0x1d>
...



However, I have no experience looking at raw assembler, so I can't discern
what it is I'm even looking at (nor do I know what explicit SSE
instructions look like in assembler).

I have a Mac that runs an Intel Core i5, and, like you, both 32- and 64-bit
versions run to completion.  Which is at least consistent with what others
are seeing with Python.

All the best,
Jason

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Bug in floating point multiplication

Reply via email to