Hi! On Tue, Sep 07, 2021 at 03:12:36AM -0400, Michael Meissner wrote: > [PATCH] Fix SFmode subreg of DImode and TImode > > This patch fixes the breakage in the PowerPC due to a recent change in SUBREG > behavior.
But what was that change? And was that intentional? If so, why wasn't it documented, was the existing behaviour considered buggy? But the documentation agrees with the previous behaviour afaics. > While it is arguable that the patch that caused the breakage should > be reverted, this patch should be a bandage to prevent these changes from > happening again. NAK. This patch will likely cause us to generate worse code. If that is not the case it will need a long, in-depth explanation of why not. Sorry. > I first noticed it in building the Spec 2017 wrf_r and blender_r > benchmarks. Once I applied this patch, I also noticed several of the > tests now pass. > > The core of the problem is we need to treat SUBREG's of SFmode and SImode > specially on the PowerPC. This is due to the fact that SFmode values that are > in the vector and floating point registers are represented as DFmode. When we > want to do a direct move between the GPR registers and the vector registers, > we > have to convert the value from the DFmode representation to/from the SFmode > representation. The core of the problem is that subreg of pseudos has three meanings: -- Paradoxical subregs; -- Actual subregs; -- "bit_cast" thingies: treat the same bits as something else. Like looking at the bits of a float as its memory image. Ignoring paradoxical subregs (as well as subregs of mem, which should have disappeared by now), and subregs of hard registers as well (those have *different* semantics after all), the other two kinds can be mixed, and *have to* be mixed, because subregs of subregs are non-canonical. Is there any reason why not to allow this kind of subreg? If we want to not allow mixing bit_cast with subregs, we should make it its own RTL code. > + /* In case we are given a SUBREG for a larger type, reduce it to > + SImode. */ > + if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4) > + { > + rtx tmp = gen_reg_rtx (SImode); > + emit_move_insn (tmp, gen_lowpart (SImode, source)); > + emit_insn (gen_movsf_from_si (dest, tmp)); > + return true; > + } This makes it two separate insns. Is that always optimised to code that is at least as good as before? Segher