On 6/9/2022 1:52 AM, Tamar Christina via Gcc-patches wrote:
Hi All,

When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.  One for the
lowpart and one for the highpart.

The problem with this is that in RTL the lvalue of the RTX is the only thing
tying the two instructions together.

This means that e.g. combine is unable to try to combine the two instructions
for setting the lowpart and highpart.

For ISAs that have bit extract instructions we can eliminate one of the extracts
if, and only if we're setting the entire complex number.

This change changes the expand code when we're setting the entire complex number
to generate a subreg for the lowpart instead of a vec_extract.

This allows us to optimize sequences such as:
Just a note.  I regularly see subregs significantly interfere with optimization, particularly register allocation.  So be aware that subregs can often get in the way of generating good code.  When changing something to use subregs I like to run real benchmarks rather than working with code snippets.



_Complex int f(int a, int b) {
     _Complex int t = a + b * 1i;
     return t;
}

from:

f:
        bfi     x2, x0, 0, 32
        bfi     x2, x1, 32, 32
        mov     x0, x2
        ret

into:

f:
        bfi     x0, x1, 32, 32
        ret

I have also confirmed the codegen for x86_64 did not change.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        * emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
        * expr.cc (emit_move_complex_parts): Emit subreg of lowpart if possible.

gcc/testsuite/ChangeLog:

        * g++.target/aarch64/complex-init.C: New test.
OK.

On a related topic, any thoughts on keeping complex objects as complex types/modes through gimple and into at least parts of the RTL pipeline?

The way complex arithmetic instructions work on our chip is going to be extremely tough to utilize in GCC -- we really need to the complex types/arithmetic up through RTL generation at the least. Ideally we'd even expose complex modes all the way to final.    Is that something y'all could benefit from as well?  Have y'all poked at this problem at all?

jeff

Reply via email to