Hi! On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote: > The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going > through the motions, but not storing the result in the given > operands[0]; it rather modifies operands[0] without effect.
Heh, oops. > It also > creates a DImode pseudo that it doesn't use, a DFmode pseudo that's > unnecessary AFAICT, and it's not indented per the GNU Coding > Standards. The patch below fixes all of these. > > I wasn't sure simplify_gen_subreg might possibly emit any code in > obscure cases, It never does, it just returns an RTX. Where would it emit the insns to? > so I left it in place, and accommodated the possibility > that the result of the mode conversion back might need copying to the > requested output operand. In my tests, the output was always a > REG:DF, so the subreg was trivial and the conversion back got the > original REG:DF output back. > I'm concerned about several issues in the mffsl testcase. First, I > don't see that comparing the values as doubles rather than as long > longs is desirable. These are FPSCR bitfields, not FP numbers. I > understand mffs et al use double because they output to FP registers, > but... The bit patterns might not even be well-formed FP numbers. "Desirable", probably not, no. But it will always work: since all the top bits are zeros always, it will always be a subnormal number, so all comparisons will work as expected / wanted. > Another issue with the test is that, if the compare fails, it calls > mffsl again to print the value, as if it would yield the same result. > But part of the FPSCR that mffsl (emulated with mmfl or not) copies to > the output FP register is the FPCC, so the fcmpu used to compare the > result of the first mmfsl will modify FPSCR and thus the result of the > second mmfsl call. Yeah, good point. We never *use* FPCC (nowhere, not just in this test), but that is somewhat beside the point. OTOH, all this is only done if we are debugging the testcase, so it isn't important either way. > Yet another issue is that the test assumed the mmfs bits not extracted > by mffsl are all zero. This appears to be the case, as the bits left > out are for sticky exceptions, but there are reserved parts of FPSCR > that might turn out to be set in the future, and then the masking in > the GCC-emulated version of mffsl would zero out those bits and cause > the compare to fail. All those extra bits are required to be set to 0. From the ISA: For Move From FPSCR Lightweight (mffsl), do the following. The contents of the control bits in the FPSCR, that is, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), and the non-sticky status bits in the FPSCR, that is, bits 45:51 (FR, FI, C, FL, FG, FE, FU), are placed into the corresponding bits in register FRT. All other bits in register FRT are set to 0. > So I put in masking in the mffs result before the compare, but then, > what if mffsl is changed so that it copies additional nonzero bits? > Should we mask both mffs and mffsl outputs? Or is it safe to leave > those bits alone and assume them to be zero at the entry point of > main(), as the test used to do? The code as-is was correct here (the compiler code as well as the testcase code). > for gcc/ChangeLog > > * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to > output operand in emulation. Simplify. Indent with a tab please. > + rtx tmp_df = operands[0]; Please don't reuse pseudos (or worse, non-pseudo registers). This was correct in the original code. > + rtx tmp_di; Please just declare at first use, like the original did. It's the more modern style, and it actually is nicer ;-) > + /* The mffs instruction reads the entire FPSCR. Emulate the mffsl > + instruction using the mffs instruction and masking off the bits > + the mmsl instruciton actually reads. */ "mffsl instruction". Heh, I see the original was bad already, oops. As I said above, the insn is *required* to zero all other bits. > + emit_insn (gen_rs6000_mffs (tmp_df)); > + tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > + emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL))); > + > + tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); > + if (operands[0] != tmp_df) > + emit_move_insn (operands[0], tmp_df); > + DONE; That "==" won't do what you want, I guess? It's not needed, anyway, it is just a distracting premature micro-optimisation. So just do the emit_move_insn please, with whitespace and comment fixed if you want? Thanks, Segher (Is there a PR, btw?)