Hi Richard, Thanks for your insightful reply.
On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Manolis Tsamis <manolis.tsa...@vrull.eu> writes: > > noce_convert_multiple_sets has been introduced and extended over time to > > handle > > if conversion for blocks with multiple sets. Currently this is focused on > > register moves and rejects any sort of arithmetic operations. > > > > This series is an extension to allow more sequences to take part in if > > conversion. The first patch is a required change to emit correct code and > > the > > second patch whitelists a larger number of operations through > > bb_ok_for_noce_convert_multiple_sets. > > > > For targets that have a rich selection of conditional instructions, > > like aarch64, I have seen an ~5x increase of profitable if conversions for > > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of > > benchmarks and I have not seen performance regressions on either x64 / > > aarch64. > > Interesting results. Are you free to say which target you used for aarch64? > > If I've understood the cost heuristics correctly, we'll allow a "predictable" > branch to be replaced by up to 5 simple conditional instructions and an > "unpredictable" branch to be replaced by up to 10 simple conditional > instructions. That seems pretty high. And I'm not sure how well we > guess predictability in the absence of real profile information. > > So my gut instinct was that the limitations of the current code might > be saving us from overly generous limits. It sounds from your results > like that might not be the case though. > > Still, if it does turn out to be the case in future, I agree we should > fix the costs rather than hamstring the code. > My writing may have been confusing, but with "~5x increase of profitable if conversions" I just meant that ifcvt considers these profitable, not that they actually are when executed in particular hardware. But at the same time I haven't yet seen any obvious performance regressions in some benchmarks that I have ran. In any case it could be interesting to microbenchmark branches vs conditional instructions and see how sane these numbers are. > > Some samples that previously resulted in a branch but now better use these > > instructions can be seen in the provided test case. > > > > Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are > > failing with an ICE but I believe that it's not an issue of this change. > > force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ])) > > is provided through emit_conditional_move. > > I guess that needs to be fixed first though. (Thanks for checking both > targets.) > I get a feeling this may be fixed if I properly take care of your points 1 & 2 below. I will report on that. > My main comments on the series are: > > (1) It isn't obvious which operations should be included in the list > in patch 2 and which shouldn't. Also, the patch only checks the > outermost operation, and so it allows the inner rtxes to be > arbitrarily complex. > > Because of that, it might be better to remove the condition > altogether and just rely on the other routines to do costing and > correctness checks. > That is true; I wanted to somehow only allow "normal arithmetic operations" and avoid generating sequences with stranger codes. I will try and see what happens if I remove the condition altogether. I also totally missed the fact that I was allowing arbitrarily complex inner rtxes so thanks for pointing that out. > (2) Don't you also need to update the "rewiring" mechanism, to cope > with cases where the then block has something like: > > if (a == 0) { > a = b op c; -> a' = a == 0 ? b op c : a; > d = a op b; -> d = a == 0 ? a' op b : d; > } a = a' > > At the moment the code only handles regs and subregs, whereas but IIUC > it should now iterate over all the regs in the SET_SRC. And I suppose > that creates the need for multiple possible rewirings in the same insn, > so that it isn't a simple insn -> index mapping any more. > Indeed, I believe this current patch cannot properly handle these. I will create testcases for this and see what changes need to be done in the next iteration so that correct code is generated. Thanks, Manolis > Thanks, > Richard > > > > > > > Changes in v2: > > - Change "conditional moves" to "conditional instructions" > > in bb_ok_for_noce_convert_multiple_sets's comment. > > > > Manolis Tsamis (2): > > ifcvt: handle sequences that clobber flags in > > noce_convert_multiple_sets > > ifcvt: Allow more operations in multiple set if conversion > > > > gcc/ifcvt.cc | 109 ++++++++++-------- > > .../aarch64/ifcvt_multiple_sets_arithm.c | 67 +++++++++++ > > 2 files changed, 127 insertions(+), 49 deletions(-) > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c