On 6/5/23 00:29, Richard Biener wrote:


But then for example x86 has smaller encoding for byte ops and while
widening is easily done later, truncation is not.
Sadly, the x86 costing looks totally bogus here. We actually emit the exact same code for a QI mode loads vs a zero-extending load from QI to SI. But the costing is different and would tend to prefer QImode. That in turn is going to force an extension at the end of the sequence which would be a regression relative to the current code. Additionally we may get partial register stalls for the byte ops to implement the comparison steps.

The net result is that querying the backend's costs would do the exact opposite of what I think we want on x86. One could argue the x86 maintainers should improve this situation...


Note I would have expected to use the mode of the load so we truly
elide some extensions, using word_mode looks like just another
mode here?  The key to note is probably

       op0 = convert_modes (mode, unit_mode, op0, 1);
       op1 = convert_modes (mode, unit_mode, op1, 1);
       rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
                                       result, 1, OPTAB_WIDEN);

which uses OPTAB_WIDEN - wouldn't it be better to pass in the
unconverted modes and leave the decision which mode to use
to OPTAB_WIDEN?  Should we somehow query the target for
the smallest mode from unit_mode it can do both the MINUS
and the compare?
And avoiding OPTAB_WIDEN isn't going to help rv64 at all. The core issue being that we do define 32bit ops. With Jivan's patch those 32bit ops expose the sign extending nature. So a 32bit add would look something like

(set (temp:DI) (sign_extend:DI (plus:SI (op:SI) (op:SI))))
(set (res:SI) (subreg:SI (temp:DI) 0)

Where we mark the subreg with SUBREG_PROMOTED_VAR_P.


I'm not sure the best way to proceed now. I could just put this on the back-burner as it's RISC-V specific and the gains elsewhere dwarf this issue.


jeff

Reply via email to