Manolis Tsamis <manolis.tsa...@vrull.eu> writes: > On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Manolis Tsamis <manolis.tsa...@vrull.eu> writes: >> > noce_convert_multiple_sets has been introduced and extended over time to >> > handle >> > if conversion for blocks with multiple sets. Currently this is focused on >> > register moves and rejects any sort of arithmetic operations. >> > >> > This series is an extension to allow more sequences to take part in if >> > conversion. The first patch is a required change to emit correct code and >> > the >> > second patch whitelists a larger number of operations through >> > bb_ok_for_noce_convert_multiple_sets. >> > >> > For targets that have a rich selection of conditional instructions, >> > like aarch64, I have seen an ~5x increase of profitable if conversions for >> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of >> > benchmarks and I have not seen performance regressions on either x64 / >> > aarch64. >> >> Interesting results. Are you free to say which target you used for aarch64? >> >> If I've understood the cost heuristics correctly, we'll allow a "predictable" >> branch to be replaced by up to 5 simple conditional instructions and an >> "unpredictable" branch to be replaced by up to 10 simple conditional >> instructions. That seems pretty high. And I'm not sure how well we >> guess predictability in the absence of real profile information. >> >> So my gut instinct was that the limitations of the current code might >> be saving us from overly generous limits. It sounds from your results >> like that might not be the case though. >> >> Still, if it does turn out to be the case in future, I agree we should >> fix the costs rather than hamstring the code. >> > > My writing may have been confusing, but with "~5x increase of > profitable if conversions" I just meant that ifcvt considers these > profitable, not that they actually are when executed in particular > hardware.
Yeah, sorry, I'd read that part as measuring the number of if-converisons. But... > But at the same time I haven't yet seen any obvious performance > regressions in some benchmarks that I have ran. ...it was a pleasant surprise that doing so much more if-conversion didn't make things noticeably worse. :) > In any case it could be interesting to microbenchmark branches vs > conditional instructions and see how sane these numbers are. I think for this we really do need the real workload, since it's hard to measure realistic branch mispredict penalties with a microbenchmark. > [...] >> (2) Don't you also need to update the "rewiring" mechanism, to cope >> with cases where the then block has something like: >> >> if (a == 0) { >> a = b op c; -> a' = a == 0 ? b op c : a; >> d = a op b; -> d = a == 0 ? a' op b : d; >> } a = a' >> >> At the moment the code only handles regs and subregs, whereas but IIUC >> it should now iterate over all the regs in the SET_SRC. And I suppose >> that creates the need for multiple possible rewirings in the same insn, >> so that it isn't a simple insn -> index mapping any more. >> > > Indeed, I believe this current patch cannot properly handle these. I > will create testcases for this and see what changes need to be done in > the next iteration so that correct code is generated. Perhaps we should change the way that the rewiring is done. At the moment, need_cmov_or_rewire detects the renumbering ahead of time. But it might be easier to: - have noce_convert_multiple_sets_1 keep track of which SET_DESTs it has replaced with temporaries. - for each subsequent instruction, go through that list in order and use insn_propagation (from recog.h) to apply each replacement. That might be simpler, and should also be more robust, since the insn_propagation routines return false on failure. Thanks, Richard