Hi!

On Fri, Apr 24, 2020 at 04:52:46AM -0300, Alexandre Oliva wrote:
> On Apr 23, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote:
> > On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote:
> >> I wasn't sure simplify_gen_subreg might possibly emit any code in
> >> obscure cases,
> 
> > It never does, it just returns an RTX.  Where would it emit the insns to?
> 
> I'm pretty sure there are or were subreg-extraction functions that could
> emit new insns onto whatever insn stream was active in certain unusual
> circumstances;

Yeah, I think so too.

> I wasn't sure whether this was one of them.

Doesn't look that way, thankfully :-)

> > since all the top bits are zeros always, it will always be a subnormal
> > number, so all comparisons will work as expected / wanted.
> 
> *nod*, as long as there's no trapping on subnormals.

There isn't :-)  I did say this isn't clean at all, just *happens* to
work, right?  :-)

> >> Yet another issue is that the test assumed the mmfs bits not extracted
> >> by mffsl are all zero.
> 
> > All those extra bits are required to be set to 0.
> 
> Not in the mffs output, no.  See, at this point I'm not talking about
> the emulation, but about the mffs use in the test proper.

Hrm, I don't see it then.

> Consider the case of emulated mffsl: we end up comparing the mffs output
> value with a masked version thereof.  If any of the mffs output bits
> could possibly be nonzero, the compare would fail, not because the
> emulation is wrong, but because the test is comparing the output of
> mffsl with that of mffs, with all its bits.

Oh.  I thought that was masked as well.

> In my testing, mffs output zero in this program, so it worked, but are
> all the non-mffsl bits of FPSCR guaranteed to be zero, so the output of
> mffs doesn't have to be masked out by the test?

No, and there are many bits not yet defined, so it may change for future
versions of the ISA.

> What if some future CPU defines some of the reserved bits in FPSCR in a
> way that they should be nonzero?

In a way different between mffs and mffsl.  Yeah.

> (And, could the ISA of such a future CPU possibly specify that mffsl
> should preserve those extra bits from FPSCR, rather than zero them out?)

Yes, I don't see why not?  Hrm, the exact language for the mffsl
description does not allow that, but that wouldn't be the best thing to
reply on.

> >> for  gcc/ChangeLog
> >> 
> >> * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
> >> output operand in emulation.  Simplify.
> 
> > Indent with a tab please.
> 
> Erhm, what I posted had TABs there.  Did it get mangled? :-(

Yes, to eight spaces, so that aligned "output" under "gcc".

> >> +      rtx tmp_df = operands[0];
> 
> > Please don't reuse pseudos
> 
> Aah, I see, that's what the original tmp_di was trying to accomplish!

:-)

> I'm familiar with that general guidance, but I didn't think there's much
> for optimizers to do with an mffs output, and the predicate ensures we
> have a reg, but...  I suppose additional pseudos make the subregging
> safer, and not adding extra uses for operands[0] might help with
> register allocation for it, so...  How about this, to also avoid reusing
> tmpdi as anddi input and output?

No pseudo should be assigned to more than once.  A la SSA, or "webs".
Many of our optimisers do not work well otherwise.

>       rtx tmp_df = gen_reg_rtx (DFmode);
> 
>       /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
>        instruction using the mffs instruction and masking off the bits
>        the mmsl instruction actually reads.  */
>       emit_insn (gen_rs6000_mffs (tmp_df));
> 
>       rtx tmp_df_as_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
>       rtx tmp_di = gen_reg_rtx (DImode);
>       emit_insn (gen_anddi3 (tmp_di, tmp_df_as_di, GEN_INT (0x70007f0ffLL)));
> 
>       rtx tmp_di_as_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
>       emit_move_insn (operands[0], tmp_di_as_df);
>       DONE;

Sure, that looks fine.  Maybe tmp1, tmp2, etc. woukld be clearer here?
:-)

> > "mffsl instruction".  Heh, I see the original was bad already, oops.
> 
> Double oops.  I had that typo fixed in some earlier version of the
> patch, but somehow I managed to drop that fix.
> 
> >> +      if (operands[0] != tmp_df)
> >> +  emit_move_insn (operands[0], tmp_df);
> >> +      DONE;
> 
> > That "==" won't do what you want, I guess?
> 
> If the incoming operands[0] is a REG, we'd be back to it after
> subregging back and forth; if it was a SUBREG to begin with, the cheap
> test would fail and we'd output the redundant move, that later passes
> will catch.  Anyway, that's gone in the snippet above.
> 
> > (Is there a PR, btw?)
> 
> Not that I know of; we've hit the testcase FAIL in our own testing.

Could you open one?  This patch will need backporting, etc.  I can do
it, if you really hate bugzilla, but I think it is easier for you to
do it :-)

Thanks!


Segher

Reply via email to