On Mon, Jun 5, 2023 at 8:41 PM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 6/5/23 00:29, Richard Biener wrote:
>
> >
> > But then for example x86 has smaller encoding for byte ops and while
> > widening is easily done later, truncation is not.
> Sadly, the x86 costing looks totally bogus here.  We actually emit the
> exact same code for a QI mode loads vs a zero-extending load from QI to
> SI.  But the costing is different and would tend to prefer QImode.  That
> in turn is going to force an extension at the end of the sequence which
> would be a regression relative to the current code.  Additionally we may
> get partial register stalls for the byte ops to implement the comparison
> steps.
>
> The net result is that querying the backend's costs would do the exact
> opposite of what I think we want on x86.  One could argue the x86
> maintainers should improve this situation...
>
> >
> > Note I would have expected to use the mode of the load so we truly
> > elide some extensions, using word_mode looks like just another
> > mode here?  The key to note is probably
> >
> >        op0 = convert_modes (mode, unit_mode, op0, 1);
> >        op1 = convert_modes (mode, unit_mode, op1, 1);
> >        rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
> >                                        result, 1, OPTAB_WIDEN);
> >
> > which uses OPTAB_WIDEN - wouldn't it be better to pass in the
> > unconverted modes and leave the decision which mode to use
> > to OPTAB_WIDEN?  Should we somehow query the target for
> > the smallest mode from unit_mode it can do both the MINUS
> > and the compare?
> And avoiding OPTAB_WIDEN isn't going to help rv64 at all.  The core
> issue being that we do define 32bit ops.  With Jivan's patch those 32bit
> ops expose the sign extending nature.  So a 32bit add would look
> something like
>
> (set (temp:DI) (sign_extend:DI (plus:SI (op:SI) (op:SI))))
> (set (res:SI) (subreg:SI (temp:DI) 0)
>
> Where we mark the subreg with SUBREG_PROMOTED_VAR_P.
>
>
> I'm not sure the best way to proceed now.  I could just put this on the
> back-burner as it's RISC-V specific and the gains elsewhere dwarf this
> issue.

I wonder if there's some more generic target macro we can key the
behavior off - SLOW_BYTE_ACCESS isn't a good fit, WORD_REGISTER_OPERATIONS
is maybe closer but it's exact implications are unknown to me.  Maybe
there's something else as well ...

The point about OPTAB_WIDEN above was that I wonder why we
extend 'op0' and 'op1' before emitting the binop when we allow WIDEN
anyway.  Yes, we want the result in 'mode' (but why?  As you say we
can extend at the end) and there's likely no way to tell expand_simple_binop
to "expand as needed and not narrow the result".  So I wonder if we should
emulate that somehow (also taking into consideration the compare).

Richard.

>
> jeff

Reply via email to