Re: [PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]

Jakub Jelinek via Gcc-patches Mon, 17 Jan 2022 04:36:28 -0800

On Mon, Jan 17, 2022 at 01:04:40PM +0100, Richard Biener wrote:
> > I guess it depends, for code that can only be called during the expand pass
> > dropping it should be just fine, for code that can be called also (or only)
> > later I think adding JUMP_LABEL and correct LABEL_NUSES is needed because
> > nothing will fix it up afterwards.
> 
> I'm noting that
> 
> +  /* BB must have no executable statements.  */
> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +  if (phi_nodes (bb))
> +    return false;
> 
> disallows blocks with just a virtual PHI which wouldn't be
> "executable".  Not sure if anything will break when we fix that.


Note I'm only moving the existing function from phiopt to tree-cfg.c
so that I can use it from tree-ssa-math-opts.c.  But all the
cond_only_block_p uses in phiopt and now in tree-ssa-maht-opts.c too
only call it on single_pred_p (bb) basic blocks, so I don't see
what the virtual PHI on those would be good for.

> For code generation we rely on RTL opts to merge compare/scc
> and the subsequent branches on -1/0/1/[-2], correct?  I wonder
> whether that works on other targets as well or whether a
> asm-goto with "optab" UNSPEC text would be more forward looking?

Yes, we rely on some RTL opts, like we rely on it for e.g. the overflow
builtins or various other cases and they seem to be doing their job
well on my testing.  Initially I thought the optab would have 6 arguments,
2 comparison args and 4 labels and I'd emit a switch in the
tree-ssa-math-opts.c (I even wrote such code).  But it didn't work really
well, the switch in some cases wasn't really optimized, and optimization
passes after the widening_mul liked e.g. to propagate the .SPACESHIP
lhs into some but not all the PHI args if there were any etc.
Emitting a function that returns -1/0/1/2 worked better, especially if
the target attempts to emit it as a series of conditional jumps
with small bbs that just set those values.  RTL opts later on will
merge the jumps with further jumps that test the .SPACESHIP result,
or will turn some of the conditional jumps into scc etc.

> The restriction to scalar floats is probably because with
> scalar integers we're doing fine and with vectors we'd need some
> very much different tricks, right?

Sure, for vectors we couldn't use branches etc.  
I'm not really sure how would one write a vector version of the
spaceship actually.  The primary use case is C++ with <=>, but <=>
returns std::*_ordering which is an aggregate and one that isn't
very easy to turn into an integer even, switch doesn't work,
only if (... == std::partial_ordering::equality) ... else if (...
(unless I'm missing something).
But even in C, maybe:
typedef float V __attribute__((vector_size (16)));
typedef int W __attribute__((vector_size (16)));

W
foo (V x, V y)
{
  return (x != y) & (((x < y) & (W) { -1, -1, -1, -1 }) | ((x > y) & (W) { 1, 
1, 1, 1 }) | ((W) { 2, 2, 2, 2 } & ~(x < y) & ~(x > y)));
}

but it isn't clear how I'd optimize it at the assembly, where
we currently emit:
        vbroadcastss    .LC2(%rip), %xmm2
        vcmpltps        %xmm0, %xmm1, %xmm3
        vbroadcastss    .LC3(%rip), %xmm4
        vpand   %xmm3, %xmm2, %xmm2
        vcmpltps        %xmm1, %xmm0, %xmm3
        vpor    %xmm3, %xmm2, %xmm2
        vcmpneq_oqps    %xmm1, %xmm0, %xmm3
        vcmpneqps       %xmm1, %xmm0, %xmm0
        vpandn  %xmm4, %xmm3, %xmm3
        vpor    %xmm3, %xmm2, %xmm2
        vpand   %xmm0, %xmm2, %xmm0
        ret
.LC2:
        .long   1
        .align 4
.LC3:
        .long   2

> The middle-end changes look OK, I don't see anything that
> couldn't be changed if other targets run into problems with
> getting similar optimized code.

Thanks.

        Jakub

Re: [PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]

Reply via email to