On Sun, Dec 5, 2010 at 6:49 PM, Chris Lattner <clatt...@apple.com> wrote: > > On Dec 5, 2010, at 3:19 AM, Richard Guenther wrote: > >>> $ clang t.cc -S -o - -O3 -mkernel -fomit-frame-pointer -mllvm >>> -show-mc-encoding >>> .section __TEXT,__text,regular,pure_instructions >>> .globl __Z4testl >>> .align 4, 0x90 >>> __Z4testl: ## @_Z4testl >>> ## BB#0: ## %entry >>> movl $4, %ecx ## encoding: >>> [0xb9,0x04,0x00,0x00,0x00] >>> movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] >>> mulq %rcx ## encoding: [0x48,0xf7,0xe1] >>> movq $-1, %rdi ## encoding: >>> [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] >>> cmovnoq %rax, %rdi ## encoding: [0x48,0x0f,0x41,0xf8] >>> jmp __Znam ## TAILCALL >>> ## encoding: [0xeb,A] >>> ## fixup A - offset: 1, value: >>> __Znam-1, kind: FK_PCRel_1 >>> .subsections_via_symbols >>> >>> This could be further improved by inverting the cmov condition to avoid the >>> first movq, which we'll tackle as a general regalloc improvement. >> >> I'm curious as on how you represent the overflow checking in your highlevel >> IL. > > The (optimized) generated IR is: > > $ clang t.cc -emit-llvm -S -o - -O3 > ... > define noalias i8* @_Z4testl(i64 %count) ssp { > entry: > %0 = tail call %0 @llvm.umul.with.overflow.i64(i64 %count, i64 4) > %1 = extractvalue %0 %0, 1 > %2 = extractvalue %0 %0, 0 > %3 = select i1 %1, i64 -1, i64 %2 > %call = tail call noalias i8* @_Znam(i64 %3) > ret i8* %call > } > > More information on the overflow intrinsics is here: > http://llvm.org/docs/LangRef.html#int_overflow
Ah, you're using intrinsics. I thought of re-using the saturating arithmetic and types we have, thus basically do size = (unsigned sat int) count * 4; and defer optimal expansion to an optab. It of course requires saturating arithmetic emulation for targets that don't provide an expander but would allow optimal expansion at least. And it'll unleash all the latent bugs we have with saturating types ... Richard. > -Chris