On Mon, Jan 17, 2022 at 01:04:40PM +0100, Richard Biener wrote: > > I guess it depends, for code that can only be called during the expand pass > > dropping it should be just fine, for code that can be called also (or only) > > later I think adding JUMP_LABEL and correct LABEL_NUSES is needed because > > nothing will fix it up afterwards. > > I'm noting that > > + /* BB must have no executable statements. */ > + gimple_stmt_iterator gsi = gsi_after_labels (bb); > + if (phi_nodes (bb)) > + return false; > > disallows blocks with just a virtual PHI which wouldn't be > "executable". Not sure if anything will break when we fix that.
Note I'm only moving the existing function from phiopt to tree-cfg.c so that I can use it from tree-ssa-math-opts.c. But all the cond_only_block_p uses in phiopt and now in tree-ssa-maht-opts.c too only call it on single_pred_p (bb) basic blocks, so I don't see what the virtual PHI on those would be good for. > For code generation we rely on RTL opts to merge compare/scc > and the subsequent branches on -1/0/1/[-2], correct? I wonder > whether that works on other targets as well or whether a > asm-goto with "optab" UNSPEC text would be more forward looking? Yes, we rely on some RTL opts, like we rely on it for e.g. the overflow builtins or various other cases and they seem to be doing their job well on my testing. Initially I thought the optab would have 6 arguments, 2 comparison args and 4 labels and I'd emit a switch in the tree-ssa-math-opts.c (I even wrote such code). But it didn't work really well, the switch in some cases wasn't really optimized, and optimization passes after the widening_mul liked e.g. to propagate the .SPACESHIP lhs into some but not all the PHI args if there were any etc. Emitting a function that returns -1/0/1/2 worked better, especially if the target attempts to emit it as a series of conditional jumps with small bbs that just set those values. RTL opts later on will merge the jumps with further jumps that test the .SPACESHIP result, or will turn some of the conditional jumps into scc etc. > The restriction to scalar floats is probably because with > scalar integers we're doing fine and with vectors we'd need some > very much different tricks, right? Sure, for vectors we couldn't use branches etc. I'm not really sure how would one write a vector version of the spaceship actually. The primary use case is C++ with <=>, but <=> returns std::*_ordering which is an aggregate and one that isn't very easy to turn into an integer even, switch doesn't work, only if (... == std::partial_ordering::equality) ... else if (... (unless I'm missing something). But even in C, maybe: typedef float V __attribute__((vector_size (16))); typedef int W __attribute__((vector_size (16))); W foo (V x, V y) { return (x != y) & (((x < y) & (W) { -1, -1, -1, -1 }) | ((x > y) & (W) { 1, 1, 1, 1 }) | ((W) { 2, 2, 2, 2 } & ~(x < y) & ~(x > y))); } but it isn't clear how I'd optimize it at the assembly, where we currently emit: vbroadcastss .LC2(%rip), %xmm2 vcmpltps %xmm0, %xmm1, %xmm3 vbroadcastss .LC3(%rip), %xmm4 vpand %xmm3, %xmm2, %xmm2 vcmpltps %xmm1, %xmm0, %xmm3 vpor %xmm3, %xmm2, %xmm2 vcmpneq_oqps %xmm1, %xmm0, %xmm3 vcmpneqps %xmm1, %xmm0, %xmm0 vpandn %xmm4, %xmm3, %xmm3 vpor %xmm3, %xmm2, %xmm2 vpand %xmm0, %xmm2, %xmm0 ret .LC2: .long 1 .align 4 .LC3: .long 2 > The middle-end changes look OK, I don't see anything that > couldn't be changed if other targets run into problems with > getting similar optimized code. Thanks. Jakub