On 6/5/23 00:29, Richard Biener wrote:
But then for example x86 has smaller encoding for byte ops and while
widening is easily done later, truncation is not.
Sadly, the x86 costing looks totally bogus here. We actually emit the
exact same code for a QI mode loads vs a zero-extending load from QI to
SI. But the costing is different and would tend to prefer QImode. That
in turn is going to force an extension at the end of the sequence which
would be a regression relative to the current code. Additionally we may
get partial register stalls for the byte ops to implement the comparison
steps.
The net result is that querying the backend's costs would do the exact
opposite of what I think we want on x86. One could argue the x86
maintainers should improve this situation...
Note I would have expected to use the mode of the load so we truly
elide some extensions, using word_mode looks like just another
mode here? The key to note is probably
op0 = convert_modes (mode, unit_mode, op0, 1);
op1 = convert_modes (mode, unit_mode, op1, 1);
rtx diff = expand_simple_binop (mode, MINUS, op0, op1,
result, 1, OPTAB_WIDEN);
which uses OPTAB_WIDEN - wouldn't it be better to pass in the
unconverted modes and leave the decision which mode to use
to OPTAB_WIDEN? Should we somehow query the target for
the smallest mode from unit_mode it can do both the MINUS
and the compare?
And avoiding OPTAB_WIDEN isn't going to help rv64 at all. The core
issue being that we do define 32bit ops. With Jivan's patch those 32bit
ops expose the sign extending nature. So a 32bit add would look
something like
(set (temp:DI) (sign_extend:DI (plus:SI (op:SI) (op:SI))))
(set (res:SI) (subreg:SI (temp:DI) 0)
Where we mark the subreg with SUBREG_PROMOTED_VAR_P.
I'm not sure the best way to proceed now. I could just put this on the
back-burner as it's RISC-V specific and the gains elsewhere dwarf this
issue.
jeff