On Sun, Aug 14, 2011 at 7:24 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
> Hello!
>
> We can use ROUNDSP/ROUNDSD in round(a) expansion. Currently, we expand
> round(a) as (-O2 -ffast-math):
>
> .LFB0:
>        .cfi_startproc
>        movsd   .LC1(%rip), %xmm1
>        movapd  %xmm0, %xmm2
>        movsd   .LC0(%rip), %xmm3
>        andpd   %xmm1, %xmm2
>        ucomisd %xmm2, %xmm3
>        jbe     .L2
>        addsd   .LC2(%rip), %xmm2
>        andnpd  %xmm0, %xmm1
>        movapd  %xmm1, %xmm0
>        cvttsd2siq      %xmm2, %rax
>        cvtsi2sdq       %rax, %xmm2
>        orpd    %xmm2, %xmm0
> .L2:
>        rep
>        ret
>
> Adding -msse4, we now generate branchless code using roundsd:
>
> .LFB0:
>        .cfi_startproc
>        movsd   .LC0(%rip), %xmm2
>        movapd  %xmm0, %xmm1
>        andpd   %xmm2, %xmm1
>        andnpd  %xmm0, %xmm2
>        addsd   .LC1(%rip), %xmm1
>        roundsd $1, %xmm1, %xmm1
>        orpd    %xmm2, %xmm1
>        movapd  %xmm1, %xmm0
>        ret

Hm, why do we need the sign-copy?  If I read the docs correctly
we can simply use roundsd directly, no?

> The patch also simplifies a couple of checks in related patterns.
>
> 2011-08-14  Uros Bizjak  <ubiz...@gmail.com>
>
>        * config/i386/i386.c (ix86_expand_round_sse4): New function.
>        * config/i386/i386-protos.h (ix86_expand_round_sse4): New prototype.
>        * config/i386/i386.md (round<mode>2): Use ix86_expand_round_sse4
>        for TARGET_ROUND.
>
>        (rint<mode>2): Simplify TARGET_ROUND check.
>        (floor<mode>2): Ditto.
>        (ceil<mode>2): Ditto.
>        (btrunc<mode>2): Ditto.
>
> Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
> will be committed to mainline soon.
>
> Uros.
>

Reply via email to