On Dec 4, 2010, at 5:22 AM, Florian Weimer wrote:

> * Joe Buck:
> 
>> It's wasted code if the multiply instruction detects the overflow.
>> It's true that the cost is small (maybe just one extra instruction
>> and the same number of tests, maybe one more on architectures where you
>> have to load a large constant), but it is slightly worse code than what
>> Chris Lattner showed.
> 
> It's possible to improve slightly on the LLVM code by using the
> overflow flag (at least on i386/amd64), as explained in this blog
> post:
> 
> <http://blogs.msdn.com/b/michael_howard/archive/2005/12/06/500629.aspx>

Ah, great point.  I improved the clang codegen to this:

$ cat t.cc 
void *test(long count) {
      return new int[count];
}
$ clang t.cc -S -o - -O3 -mkernel -fomit-frame-pointer -mllvm -show-mc-encoding
        .section        __TEXT,__text,regular,pure_instructions
        .globl  __Z4testl
        .align  4, 0x90
__Z4testl:                              ## @_Z4testl
## BB#0:                                ## %entry
        movl    $4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
        movq    %rdi, %rax              ## encoding: [0x48,0x89,0xf8]
        mulq    %rcx                    ## encoding: [0x48,0xf7,0xe1]
        movq    $-1, %rdi               ## encoding: 
[0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
        cmovnoq %rax, %rdi              ## encoding: [0x48,0x0f,0x41,0xf8]
        jmp     __Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]
                                        ##   fixup A - offset: 1, value: 
__Znam-1, kind: FK_PCRel_1
.subsections_via_symbols

This could be further improved by inverting the cmov condition to avoid the 
first movq, which we'll tackle as a general regalloc improvement.

Thanks for the pointer!

-Chris

Reply via email to