https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #3 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Vladimir Makarov from comment #2)
> (In reply to Kewen Lin from comment #1)
> > Created attachment 50715 [details]
> > ira:consider matching cstr in all alternatives
> > 
> > With little understanding on ira, I am not quite sure this patch is on the
> > reasonable direction. It aims to check the matching constraint in all
> > alternatives, if there is one alternative with matching constraint and
> > matches the current preferred regclass, it will record the output operand
> > number and further create one copy for it. Normally it can get the priority
> > against shuffle copies and the matching constraint will get satisfied with
> > higher possibility, reload doesn't create extra copies to meet the matching
> > constraint or the desirable register class when it has to.
> > 
> > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay as
> > shuffle copies, and later any of A,B,C,D gets assigned by one hardware
> > register which is a VSX register but not a FP register, which means it has
> > to pay costs once we can NOT go with VSX alternatives, so at that time we
> > can increase the freq for the remaining copies related to this, once the
> > matching constraint gets satisfied further, there aren't any extra costs to
> > pay. This idea seems a bit complicated in the current framework, so the
> > proposed patch aggressively emphasizes the matching constraint at the time
> > of creating copies.
> > 
> > FWIW bootstrapped/regtested on powerpc64le-linux-gnu P9. The evaluation with
> > Power9 SPEC2017 all run shows 505.mcf_r +2.98%, 508.namd_r +3.37%, 519.lbm_r
> > +2.51%, no remarkable degradation is observed.
> 
> Thank you for working on this issue.
> 
> The current implementation of ira_get_dup_out_num was specifically tuned for
> better register allocation for x86-64 div insns.
> 
> Your patch definitely improves code for power9 and I would say significantly
> (congratulations!).  The patch you proposed makes me think that it might
> work for major targets as well.
> 
> I would prefer to avoid introducing new parameter because there are too many
> of them already and its description is cryptic.
> 

Thanks for your comments, Vladimir!  Yeah, Segher also thought it can benefit
other targets and suggested making it on by default, I've made this parameter
on by default in v2, if it's fine on x86-64 and aarch64 with some testing and
benchmarking later, I think we can simply get rid of the parameter as you
suggested. 

> It would be nice if you benchmark the patch on x86-64 too, If there is no
> overall degradation with new behaviour we could remove the parameter and
> make the new behaviour as a default. If it is not, well we will keep the
> parameter.
> 

Sorry that I don't have a x86-64 or aarch64 performance machine at hand, the
new version v2 was bootstrapped/regtested on powerpc64le-linux-gnu P9 and
x86_64-redhat-linux, but had some failures on aarch64, I was still
investigating it.  Once it got root-caused and fixed, I would ask around folks
to help to benchmark this.

> As for the patch itself, I don't like some variable names.  Sorry.  Could
> you use op_regno, out_regno, and present_alt instead of op_no, out_no, tot. 
> Please, in general use longer variable names reflecting their purpose as GCC
> developers reads code in many times more than writing it.

Got it, thanks for the suggestion! This part has been simplified with
recog_op_alt, hope it looks better.

Reply via email to