Manolis Tsamis <manolis.tsa...@vrull.eu> writes:
> On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> Manolis Tsamis <manolis.tsa...@vrull.eu> writes:
>> > noce_convert_multiple_sets has been introduced and extended over time to 
>> > handle
>> > if conversion for blocks with multiple sets. Currently this is focused on
>> > register moves and rejects any sort of arithmetic operations.
>> >
>> > This series is an extension to allow more sequences to take part in if
>> > conversion. The first patch is a required change to emit correct code and 
>> > the
>> > second patch whitelists a larger number of operations through
>> > bb_ok_for_noce_convert_multiple_sets.
>> >
>> > For targets that have a rich selection of conditional instructions,
>> > like aarch64, I have seen an ~5x increase of profitable if conversions for
>> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
>> > benchmarks and I have not seen performance regressions on either x64 / 
>> > aarch64.
>>
>> Interesting results.  Are you free to say which target you used for aarch64?
>>
>> If I've understood the cost heuristics correctly, we'll allow a "predictable"
>> branch to be replaced by up to 5 simple conditional instructions and an
>> "unpredictable" branch to be replaced by up to 10 simple conditional
>> instructions.  That seems pretty high.  And I'm not sure how well we
>> guess predictability in the absence of real profile information.
>>
>> So my gut instinct was that the limitations of the current code might
>> be saving us from overly generous limits.  It sounds from your results
>> like that might not be the case though.
>>
>> Still, if it does turn out to be the case in future, I agree we should
>> fix the costs rather than hamstring the code.
>>
>
> My writing may have been confusing, but with "~5x increase of
> profitable if conversions" I just meant that ifcvt considers these
> profitable, not that they actually are when executed in particular
> hardware.

Yeah, sorry, I'd read that part as measuring the number of if-converisons.
But...

> But at the same time I haven't yet seen any obvious performance
> regressions in some benchmarks that I have ran.

...it was a pleasant surprise that doing so much more if-conversion
didn't make things noticeably worse. :)

> In any case it could be interesting to microbenchmark branches vs
> conditional instructions and see how sane these numbers are.

I think for this we really do need the real workload, since it's
hard to measure realistic branch mispredict penalties with a
microbenchmark.

> [...]
>> (2) Don't you also need to update the "rewiring" mechanism, to cope
>>     with cases where the then block has something like:
>>
>>       if (a == 0) {
>>         a = b op c;       ->    a' = a == 0 ? b op c : a;
>>         d = a op b;       ->    d = a == 0 ? a' op b : d;
>>       }                         a = a'
>>
>>     At the moment the code only handles regs and subregs, whereas but IIUC
>>     it should now iterate over all the regs in the SET_SRC.  And I suppose
>>     that creates the need for multiple possible rewirings in the same insn,
>>     so that it isn't a simple insn -> index mapping any more.
>>
>
> Indeed, I believe this current patch cannot properly handle these. I
> will create testcases for this and see what changes need to be done in
> the next iteration so that correct code is generated.

Perhaps we should change the way that the rewiring is done.
At the moment, need_cmov_or_rewire detects the renumbering
ahead of time.  But it might be easier to:

- have noce_convert_multiple_sets_1 keep track of which
  SET_DESTs it has replaced with temporaries.

- for each subsequent instruction, go through that list in order
  and use insn_propagation (from recog.h) to apply each replacement.

That might be simpler, and should also be more robust, since the
insn_propagation routines return false on failure.

Thanks,
Richard

Reply via email to