On Fri, Jul 3, 2015 at 11:13 AM, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote:
> On 2 July 2015 at 18:29, Jason Swails <jason.swa...@gmail.com> wrote: > > > > As others have suggested, this is almost certainly a 32-bit vs. 64-bit > > issue. Consider the following C program: > > > > // maths.h > > #include <math.h> > > #include <stdio.h> > > > > int main() { > > double x; > > int i; > > x = 1-pow(0.5, 53); > > > > for (i = 1; i < 1000000; i++) { > > if ((int)(i*x) == i) { > > printf("%d\n", i); > > break; > > } > > } > > > > return 0; > > } > > > > For the most part, this should be as close to an exact transliteration of > > your Python code as possible. > > > > Here's what I get when I try compiling and running it on my 64-bit > (Gentoo) > > Linux machine with 32-bit compatible libs: > > > > swails@batman ~/test $ gcc maths.c > > swails@batman ~/test $ ./a.out > > swails@batman ~/test $ gcc -m32 maths.c > > swails@batman ~/test $ ./a.out > > 2049 > > I was unable to reproduce this on my system. In both cases the loops > run to completion. A look at the assembly generated by gcc shows that > something different goes on there though. > > The loop in the 64 bit one (in the main function) looks like: > > $ objdump -d a.out | less > ... > 400555: pxor %xmm0,%xmm0 > 400559: cvtsi2sdl -0xc(%rbp),%xmm0 > 40055e: mulsd -0x8(%rbp),%xmm0 > 400563: cvttsd2si %xmm0,%eax > 400567: cmp -0xc(%rbp),%eax > 40056a: jne 400582 <main+0x4c> > 40056c: mov -0xc(%rbp),%eax > 40056f: mov %eax,%esi > 400571: mov $0x400624,%edi > 400576: mov $0x0,%eax > 40057b: callq 400410 <printf@plt> > 400580: jmp 40058f <main+0x59> > 400582: addl $0x1,-0xc(%rbp) > 400586: cmpl $0xf423f,-0xc(%rbp) > 40058d: jle 400555 <main+0x1f> > ... > > Where is the 32 bit one looks like: > > $ objdump -d a.out.32 | less > ... > 804843e: fildl -0x14(%ebp) > 8048441: fmull -0x10(%ebp) > 8048444: fnstcw -0x1a(%ebp) > 8048447: movzwl -0x1a(%ebp),%eax > 804844b: mov $0xc,%ah > 804844d: mov %ax,-0x1c(%ebp) > 8048451: fldcw -0x1c(%ebp) > 8048454: fistpl -0x20(%ebp) > 8048457: fldcw -0x1a(%ebp) > 804845a: mov -0x20(%ebp),%eax > 804845d: cmp -0x14(%ebp),%eax > 8048460: jne 8048477 <main+0x5c> > 8048462: sub $0x8,%esp > 8048465: pushl -0x14(%ebp) > 8048468: push $0x8048520 > 804846d: call 80482f0 <printf@plt> > 8048472: add $0x10,%esp > 8048475: jmp 8048484 <main+0x69> > 8048477: addl $0x1,-0x14(%ebp) > 804847b: cmpl $0xf423f,-0x14(%ebp) > 8048482: jle 804843e <main+0x23> > ... > > So the 64 bit one is using SSE instructions and the 32-bit one is > using x87. That could explain the difference you see at the C level > but I don't see it on this CPU (/proc/cpuinfo says Intel(R) Core(TM) > i5-3427U CPU @ 1.80GHz). > Hmm. Well that could explain why you don't get the same results as me. My CPU is a AMD FX(tm)-6100 Six-Core Processor (from /proc/cpuinfo). My objdump looks the same as yours for the 64-bit version, but for 32-bit it looks like: ... 804843a: db 44 24 14 fildl 0x14(%esp) 804843e: dc 4c 24 18 fmull 0x18(%esp) 8048442: dd 5c 24 08 fstpl 0x8(%esp) 8048446: f2 0f 2c 44 24 08 cvttsd2si 0x8(%esp),%eax 804844c: 3b 44 24 14 cmp 0x14(%esp),%eax 8048450: 75 16 jne 8048468 <main+0x4b> 8048452: 8b 44 24 14 mov 0x14(%esp),%eax 8048456: 89 44 24 04 mov %eax,0x4(%esp) 804845a: c7 04 24 10 85 04 08 movl $0x8048510,(%esp) 8048461: e8 8a fe ff ff call 80482f0 <printf@plt> 8048466: eb 0f jmp 8048477 <main+0x5a> 8048468: 83 44 24 14 01 addl $0x1,0x14(%esp) 804846d: 81 7c 24 14 3f 42 0f cmpl $0xf423f,0x14(%esp) 8048474: 00 8048475: 7e c3 jle 804843a <main+0x1d> ... However, I have no experience looking at raw assembler, so I can't discern what it is I'm even looking at (nor do I know what explicit SSE instructions look like in assembler). I have a Mac that runs an Intel Core i5, and, like you, both 32- and 64-bit versions run to completion. Which is at least consistent with what others are seeing with Python. All the best, Jason
-- https://mail.python.org/mailman/listinfo/python-list