https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116312
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |INVALID
Status|UNCONFIRMED |RESOLVED
--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> >but we could implement it as a simple final assembly output template change
> >for minimal invasion.
>
> No you can't since ldp and ld2 mean 2 different things.
>
> ld2 is basically a perm to unmix the two registers. that is load lanes.
>
> Note in the GCC case there is only one fadd while in LLVM there are 2 though
> indepedent.
>
> so the question becomes is the ldp better than ld2 here? overall or just
> looking at the ldp vs ld2?
Yeah you're right, it's too early in the morning for me...
It could be that LLVM's vectorisation approach here is better, but that's a
separate discussion