Hi, Przemyslaw Wirkus <przemyslaw.wir...@arm.com> writes: > Hi all, > > Vectorise __builtin_signbit (v4sf) with unsigned shift right vector > instruction. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Assembly output for: > $ aarch64-elf-gcc -S -O3 signbitv4sf.c -dp > > Before patch: > > foo: > adrp x3, in // 37 [c=4 l=4] *movdi_aarch64/12 > adrp x2, out // 40 [c=4 l=4] *movdi_aarch64/12 > add x3, x3, :lo12:in // 39 [c=4 l=4] add_losym_di > add x2, x2, :lo12:out // 42 [c=4 l=4] add_losym_di > mov x0, 0 // 3 [c=4 l=4] *movdi_aarch64/3 > .p2align 3,,7 > .L2: > ldr w1, [x3, x0] // 10 [c=16 l=4] *zero_extendsidi2_aarch64/1 > and w1, w1, -2147483648 // 11 [c=4 l=4] andsi3/1 > str w1, [x2, x0] // 16 [c=4 l=4] *movsi_aarch64/8 > add x0, x0, 4 // 17 [c=4 l=4] *adddi3_aarch64/0 > cmp x0, 4096 // 19 [c=4 l=4] cmpdi/1 > bne .L2 // 20 [c=4 l=4] condjump > ret // 50 [c=0 l=4] *do_return > > After patch: > > foo: > adrp x2, in // 36 [c=4 l=4] *movdi_aarch64/12 > adrp x1, out // 39 [c=4 l=4] *movdi_aarch64/12 > add x2, x2, :lo12:in // 38 [c=4 l=4] add_losym_di > add x1, x1, :lo12:out // 41 [c=4 l=4] add_losym_di > mov x0, 0 // 3 [c=4 l=4] *movdi_aarch64/3 > .p2align 3,,7 > .L2: > ldr q0, [x2, x0] // 10 [c=8 l=4] *aarch64_simd_movv4sf/0 > ushr v0.4s, v0.4s, 31 // 11 [c=12 l=4] > aarch64_simd_lshrv4si > str q0, [x1, x0] // 15 [c=4 l=4] *aarch64_simd_movv4si/2 > add x0, x0, 16 // 16 [c=4 l=4] *adddi3_aarch64/0 > cmp x0, 4096 // 18 [c=4 l=4] cmpdi/1 > bne .L2 // 19 [c=4 l=4] condjump > ret // 49 [c=0 l=4] *do_return > > Thanks, > Przemyslaw > > gcc/ChangeLog: > > 2019-03-20 Przemyslaw Wirkus <przemyslaw.wir...@arm.com> > > * config/aarch64/aarch64-builtins.c > (aarch64_builtin_vectorized_function): Added CASE_CFN_SIGNBIT. > * config/aarch64/aarch64-simd-builtins.def: (signbit) > Extend to V4SF mode. > * config/aarch64/aarch64-simd.md (signbitv4sf2): New expand > defined.
I think it'd be better to add a new IFN_SIGNBIT internal function that maps to signbit_optab. That way the compiler will know what the vector function does and there'll be no need to add a new built-in function. Thanks, Richard