Hi!

On Tue, Sep 07, 2021 at 03:12:36AM -0400, Michael Meissner wrote:
> [PATCH] Fix SFmode subreg of DImode and TImode
> 
> This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
> behavior.

But what was that change?  And was that intentional?  If so, why wasn't
it documented, was the existing behaviour considered buggy?  But the
documentation agrees with the previous behaviour afaics.

> While it is arguable that the patch that caused the breakage should
> be reverted, this patch should be a bandage to prevent these changes from
> happening again.

NAK.  This patch will likely cause us to generate worse code.  If that
is not the case it will need a long, in-depth explanation of why not.

Sorry.

> I first noticed it in building the Spec 2017 wrf_r and blender_r
> benchmarks.  Once I applied this patch, I also noticed several of the
> tests now pass.
> 
> The core of the problem is we need to treat SUBREG's of SFmode and SImode
> specially on the PowerPC.  This is due to the fact that SFmode values that are
> in the vector and floating point registers are represented as DFmode.  When we
> want to do a direct move between the GPR registers and the vector registers, 
> we
> have to convert the value from the DFmode representation to/from the SFmode
> representation.

The core of the problem is that subreg of pseudos has three meanings:
  -- Paradoxical subregs;
  -- Actual subregs;
  -- "bit_cast" thingies: treat the same bits as something else.  Like
     looking at the bits of a float as its memory image.

Ignoring paradoxical subregs (as well as subregs of mem, which should
have disappeared by now), and subregs of hard registers as well (those
have *different* semantics after all), the other two kinds can be mixed,
and *have to* be mixed, because subregs of subregs are non-canonical.

Is there any reason why not to allow this kind of subreg?

If we want to not allow mixing bit_cast with subregs, we should make it
its own RTL code.

> +      /* In case we are given a SUBREG for a larger type, reduce it to
> +      SImode.  */
> +      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
> +     {
> +       rtx tmp = gen_reg_rtx (SImode);
> +       emit_move_insn (tmp, gen_lowpart (SImode, source));
> +       emit_insn (gen_movsf_from_si (dest, tmp));
> +       return true;
> +     }

This makes it two separate insns.  Is that always optimised to code that
is at least as good as before?


Segher

Reply via email to