On Sun, Aug 14, 2011 at 7:24 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > Hello! > > We can use ROUNDSP/ROUNDSD in round(a) expansion. Currently, we expand > round(a) as (-O2 -ffast-math): > > .LFB0: > .cfi_startproc > movsd .LC1(%rip), %xmm1 > movapd %xmm0, %xmm2 > movsd .LC0(%rip), %xmm3 > andpd %xmm1, %xmm2 > ucomisd %xmm2, %xmm3 > jbe .L2 > addsd .LC2(%rip), %xmm2 > andnpd %xmm0, %xmm1 > movapd %xmm1, %xmm0 > cvttsd2siq %xmm2, %rax > cvtsi2sdq %rax, %xmm2 > orpd %xmm2, %xmm0 > .L2: > rep > ret > > Adding -msse4, we now generate branchless code using roundsd: > > .LFB0: > .cfi_startproc > movsd .LC0(%rip), %xmm2 > movapd %xmm0, %xmm1 > andpd %xmm2, %xmm1 > andnpd %xmm0, %xmm2 > addsd .LC1(%rip), %xmm1 > roundsd $1, %xmm1, %xmm1 > orpd %xmm2, %xmm1 > movapd %xmm1, %xmm0 > ret
Hm, why do we need the sign-copy? If I read the docs correctly we can simply use roundsd directly, no? > The patch also simplifies a couple of checks in related patterns. > > 2011-08-14 Uros Bizjak <ubiz...@gmail.com> > > * config/i386/i386.c (ix86_expand_round_sse4): New function. > * config/i386/i386-protos.h (ix86_expand_round_sse4): New prototype. > * config/i386/i386.md (round<mode>2): Use ix86_expand_round_sse4 > for TARGET_ROUND. > > (rint<mode>2): Simplify TARGET_ROUND check. > (floor<mode>2): Ditto. > (ceil<mode>2): Ditto. > (btrunc<mode>2): Ditto. > > Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, > will be committed to mainline soon. > > Uros. >