Hi all,

Pattern "(x | y) - y" can be optimized to simple "(x & ~y)" andn pattern.

So, for the example code:

$ cat main.c
int
f_i(int x, int y)
{
        return (x | y) - y;
}

long long
f_l(long long x, long long y)
{
        return (x | y) - y;
}

typedef int v4si __attribute__ ((vector_size (16)));
typedef long long v2di __attribute__ ((vector_size (16)));

v4si
f_v4si(v4si a, v4si b) {
        return (a | b) - b;
}

v2di
f_v2di(v2di a, v2di b) {
        return (a | b) - b;
}

void
f(v4si *d, v4si *a, v4si *b) {
        for (int i=0; i<N; i++)
                d[i] = (a[i] | b[i]) - b[i];
}

Before this patch:
$ ./aarch64-none-linux-gnu-gcc -S -O2 main.c -dp

f_i:
                orr     w0, w0, w1        // 8    [c=4 l=4]  iorsi3/0
                sub     w0, w0, w1        // 14   [c=4 l=4]  subsi3
                ret       // 24       [c=0 l=4]  *do_return
f_l:
                orr     x0, x0, x1        // 8    [c=4 l=4]  iordi3/0
                sub     x0, x0, x1        // 14   [c=4 l=4]  subdi3/0
                ret       // 24       [c=0 l=4]  *do_return
f_v4si:
                orr     v0.16b, v0.16b, v1.16b    // 8    [c=8 l=4]  iorv4si3/0
                sub     v0.4s, v0.4s, v1.4s       // 14 [c=8 l=4]  subv4si3
                ret       // 24       [c=0 l=4]  *do_return
f_v2di:
                orr     v0.16b, v0.16b, v1.16b    // 8    [c=8 l=4]  iorv2di3/0
                sub     v0.2d, v0.2d, v1.2d       // 14 [c=8 l=4]  subv2di3
                ret       // 24       [c=0 l=4]  *do_return

After this patch:
$ ./aarch64-none-linux-gnu-gcc -S -O2 main.c -dp

f_i:
                bic     w0, w0, w1      // 13   [c=8 l=4]  *bic_and_not_si3
                ret             // 23   [c=0 l=4]  *do_return
f_l:
                bic     x0, x0, x1      // 13   [c=8 l=4]  *bic_and_not_di3
                ret             // 23   [c=0 l=4]  *do_return
f_v4si:
                bic     v0.16b, v0.16b, v1.16b  // 13   [c=16 l=4]  
*bic_and_not_simd_v4si3
                ret             // 23   [c=0 l=4]  *do_return
f_v2di:
                bic     v0.16b, v0.16b, v1.16b  // 13   [c=16 l=4]  
*bic_and_not_simd_v2di3
                ret             // 23   [c=0 l=4]  *do_return

Bootstrapped and tested on aarch64-none-linux-gnu.

OK for master ?

Cheers,
Przemyslaw

gcc/ChangeLog:

        PR tree-optimization/94880
        * config/aarch64/aarch64.md (bic_and_not_<mode>3): New define_insn.
        * config/aarch64/aarch64-simd.md (bic_and_not_simd_<mode>3): New
        define_insn.

gcc/testsuite/ChangeLog:

        PR tree-optimization/94880
        * gcc.target/aarch64/bic_and_not_di3.c: New test.
        * gcc.target/aarch64/bic_and_not_si3.c: New test.
        * gcc.target/aarch64/bic_and_not_v2di3.c: New test.
        * gcc.target/aarch64/bic_and_not_v4si3.c: New test.

Attachment: patch.patch
Description: patch.patch

Reply via email to