Hello,

I am studying the difference between x86 generated code of DMD and C/C++ compilers on Windows (simply put: why exactly, and by what margin, DMD-compiled D code is often slower than GCC-compiled C/C++ equivalent).

Now, I have this simple D program:

-----
immutable int MAX_N = 1_000_000;
void main () {
    int [MAX_N] a;
    foreach (i; 0..MAX_N)
        a[i] = i;
}
-----

(I know there's iota in std.range, and it turns out to be even slower - but that's a high level function, and I'm trying to understand the lower-level details now.)

The assembly (dmd -O -release -inline -noboundscheck, then obj2asm) has the following piece corresponding to the cycle:

-----
L2C:            mov     -03D0900h[EDX*4][EBP],EDX
                mov     ECX,EDX
                inc     EDX
                cmp     EDX,0F4240h
                jb      L2C
-----

Now, I am not exactly fluent in assembler, but the "mov ECX, EDX" seems unnecessary. The ECX register is explicitly used three times in the whole program, and it looks like this instruction can at least be moved out of the loop, if not removed completely. Is it indeed a bug, or there's some reason here? And if the former, where do I report it - at http://d.puremagic.com/issues/, as with the front-end?

I didn't try GDC or LDC since I didn't find a clear instruction for using them under Win32. If there is one, please kindly point me to it. I found a few explanations for GDC, but had a hard time trying to figure out which is the most current one.

Note that the C++ version does the same with four instructions instead of five, as D version is expected to be if we remove the instruction in question. Indeed, it goes like (code inside the loop):

-----
L3:
        movl    %eax, _a(,%eax,4)
        addl    $1, %eax
        cmpl    $1000000, %eax
        jne     L3
-----

The full assembly listings, and the source codes (D and C++), are here:
http://acm.math.spbu.ru/~gassa/dlang/simple_loop/

I've tried a few other versions as well. Changing the loop to an explicit "for (int i = 0; i < MAX_N; i++)" (a2.d) does not affect the generated assembly. Making the array dynamic (a3.d) leads to five instructions, all seemingly important. A __gshared static array (a4.d) gives the same seemingly unneeded instruction but with EAX instead of ECX:

-----
L2:             mov     _D2a41aG1000000i[EDX*4],EDX
                mov     EAX,EDX
                inc     EDX
                cmp     EDX,0F4240h
                jb      L2
-----

Ivan Kazmenko.

Reply via email to