On 6/13/2022 4:19 AM, Tamar Christina wrote:
-----Original Message-----
From: Gcc-patches <gcc-patches-
bounces+tamar.christina=arm....@gcc.gnu.org> On Behalf Of Jeff Law via
Gcc-patches
Sent: Sunday, June 12, 2022 6:27 PM
To: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
set the lowpart.



On 6/9/2022 1:52 AM, Tamar Christina via Gcc-patches wrote:
Hi All,

When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
One
for the lowpart and one for the highpart.

The problem with this is that in RTL the lvalue of the RTX is the only
thing tying the two instructions together.

This means that e.g. combine is unable to try to combine the two
instructions for setting the lowpart and highpart.

For ISAs that have bit extract instructions we can eliminate one of
the extracts if, and only if we're setting the entire complex number.

This change changes the expand code when we're setting the entire
complex number to generate a subreg for the lowpart instead of a
vec_extract.
This allows us to optimize sequences such as:
Just a note.  I regularly see subregs significantly interfere with optimization,
particularly register allocation.  So be aware that subregs can often get in the
way of generating good code.  When changing something to use subregs I
like to run real benchmarks rather than working with code snippets.


_Complex int f(int a, int b) {
      _Complex int t = a + b * 1i;
      return t;
}

from:

f:
        bfi     x2, x0, 0, 32
        bfi     x2, x1, 32, 32
        mov     x0, x2
        ret

into:

f:
        bfi     x0, x1, 32, 32
        ret

I have also confirmed the codegen for x86_64 did not change.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        * emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
        * expr.cc (emit_move_complex_parts): Emit subreg of lowpart if
possible.
gcc/testsuite/ChangeLog:

        * g++.target/aarch64/complex-init.C: New test.
OK.

On a related topic, any thoughts on keeping complex objects as complex
types/modes through gimple and into at least parts of the RTL pipeline?

The way complex arithmetic instructions work on our chip is going to be
extremely tough to utilize in GCC -- we really need to the complex
types/arithmetic up through RTL generation at the least. Ideally we'd even
expose complex modes all the way to final.    Is that something y'all could
benefit from as well?  Have y'all poked at this problem at all?
Not extensively, but right now the big advantage of lowering them early is for
auto-vec.   Lowering them early allows you to e.g. realize that you only need 
the
imaginary part of the number etc.  For
In the case where you only operate on part of the object, then decomposing early and using standard vops and fops on the components absolutely makes sense.  That we already do.

The case I want to handle is when we're operating on both the complex and imaginary parts at the same time.  I've got instructions to do that, but where they find their operands is, umm, inconvenient -- a complex add ends up looking like a 2-element vector add.


auto-vec it also means we treat them as
just any other loads/stores.

I think LLVM keeps them as complex expressions much longer and they've been
having a harder time implementing some of the complex arith stuff we did in GCC 
11.
Good to know.  THanks.

jeff

Reply via email to