Hi!
On Tue, Sep 07, 2021 at 03:12:36AM -0400, Michael Meissner wrote:
> [PATCH] Fix SFmode subreg of DImode and TImode
>
> This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
> behavior.
But what was that change? And was that intentional? If so, why wasn't
it documented, was the existing behaviour considered buggy? But the
documentation agrees with the previous behaviour afaics.
> While it is arguable that the patch that caused the breakage should
> be reverted, this patch should be a bandage to prevent these changes from
> happening again.
NAK. This patch will likely cause us to generate worse code. If that
is not the case it will need a long, in-depth explanation of why not.
Sorry.
> I first noticed it in building the Spec 2017 wrf_r and blender_r
> benchmarks. Once I applied this patch, I also noticed several of the
> tests now pass.
>
> The core of the problem is we need to treat SUBREG's of SFmode and SImode
> specially on the PowerPC. This is due to the fact that SFmode values that are
> in the vector and floating point registers are represented as DFmode. When we
> want to do a direct move between the GPR registers and the vector registers,
> we
> have to convert the value from the DFmode representation to/from the SFmode
> representation.
The core of the problem is that subreg of pseudos has three meanings:
-- Paradoxical subregs;
-- Actual subregs;
-- "bit_cast" thingies: treat the same bits as something else. Like
looking at the bits of a float as its memory image.
Ignoring paradoxical subregs (as well as subregs of mem, which should
have disappeared by now), and subregs of hard registers as well (those
have *different* semantics after all), the other two kinds can be mixed,
and *have to* be mixed, because subregs of subregs are non-canonical.
Is there any reason why not to allow this kind of subreg?
If we want to not allow mixing bit_cast with subregs, we should make it
its own RTL code.
> + /* In case we are given a SUBREG for a larger type, reduce it to
> + SImode. */
> + if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
> + {
> + rtx tmp = gen_reg_rtx (SImode);
> + emit_move_insn (tmp, gen_lowpart (SImode, source));
> + emit_insn (gen_movsf_from_si (dest, tmp));
> + return true;
> + }
This makes it two separate insns. Is that always optimised to code that
is at least as good as before?
Segher