Qing Zhao <qing.z...@oracle.com> writes: >>> + } >>> + return need_zeroed_hardregs; >>> +} >>> + >>> >>> With the small testing case: >>> int >>> test () >>> { >>> return 1; >>> } >>> >>> If I compiled it with >>> >>> /home/qinzhao/Install/latest/bin/gcc -O2 -fzero-call-used-regs=all-arg t.c >>> >>> It will failed as: >>> >>> t.c: In function ‘test’: >>> t.c:6:1: error: insn does not satisfy its constraints: >>> 6 | } >>> | ^ >>> (insn 28 27 29 (set (reg:DI 33 1) >>> (const_int 0 [0])) "t.c":6:1 647 {*movdi_internal64} >>> (nil)) >>> during RTL pass: shorten >>> dump file: t.c.319r.shorten >>> t.c:6:1: internal compiler error: in extract_constrain_insn_cached, at >>> recog.c:2207 >>> 0x1018d693 _fatal_insn(char const*, rtx_def const*, char const*, int, char >>> const*) >>> ../../latest-gcc-x86/gcc/rtl-error.c:108 >>> 0x1018d6e7 _fatal_insn_not_found(rtx_def const*, char const*, int, char >>> const*) >>> ../../latest-gcc-x86/gcc/rtl-error.c:118 >>> 0x1099a82b extract_constrain_insn_cached(rtx_insn*) >>> ../../latest-gcc-x86/gcc/recog.c:2207 >>> 0x11393917 insn_min_length(rtx_insn*) >>> ../../latest-gcc-x86/gcc/config/rs6000/rs6000.md:721 >>> 0x105bece3 shorten_branches(rtx_insn*) >>> ../../latest-gcc-x86/gcc/final.c:1118 >>> >>> >>> As I checked, when the FP registers are zeroed, the above failure happened. >>> >>> I suspect that the issue still relate to the following statement: >>> >>> machine_mode mode = reg_raw_mode[regno]; >>> >>> As I checked, the reg_raw_mode always return the integer mode that can be >>> hold by the hard registers, even though it’s FP register. >> >> Well, more precisely: it's the largest mode that the target allows the >> registers to hold. If there are multiple candidate modes of the same >> size, the integer one wins, like you say. But the point is that DI only >> wins over DF because the target allows both DI and DF to be stored in >> the register, and therefore supports both DI and DF moves for that >> register. >> >> So I don't think the mode is the issue. Integer zero and floating-point >> zero have the same bit representation after all. > > theoritically yes. > However, as we have noticed in Aarch64, the integer TI move has not been > supported before your fix today. As a result, the Ti move have to be splitted. > With your fix today on aarch64, Yes, the default implementation works well > for those vector registers. Thanks a lot. > > Potentially there will be other targets that have the same issue. Then those > targets need to fix those issues too in order to make the default > implementation work.
Right. But that's not a bad thing. My point above was that what you describe was not the issue for Power. AIUI the issue there was… >> AIUI, without VSX, Power needs to load the zero from the constant pool. …this instead. >>> So, I still wondering: >>> >>> 1. Is there another available utility routine that returns the proper MODE >>> for the hard registers that can be readily used to zero the hard register? >>> 2. If not, should I add one more target hook for this purpose? i.e >>> >>> /* Return the proper machine mode that can be used to zero this hard >>> register specified by REGNO. */ >>> machine_mode zero-call-used-regs-mode (unsigned int REGNO) >>> >>> 3. Or should I just delete the default implemeantion, and let the target to >>> implement it. >> >> IMO no. This goes back to what we discussed earlier. It isn't the >> case that a default target hook has to be correct for all targets, >> with targets only overriding them as an optimisation. The default >> versions of many hooks and macros are not conservatively correct. >> They are just reaonable default assumptions. And IMO that's true >> of the hook above too. >> >> The way to flush out whether a target needs to override the hook >> is to add tests that run on all targets. > I planned to add these new test cases, so currently I have been testing the > simple testing cases on aarch64 and rs6000 to see any issue > With the default implementation. So far, I have found these issues with the > very simple testing cases. > > For me, at most I can test aarch64 and rs6000 targets for some small testing > cases for checking correctness. Thanks for testing other targets. But I don't think that invalidates what I said above. It might be that some of the targets you pick to test are ones that can't use the default implementation (or at least, not in all circumstances). At that point, hopefully target maintainers will step in to help. >> That said, one way of erring on the side of caution from an ICE >> perspective would be to do something like: >> >> rtx_insn *last_insn = get_last_insn (); >> rtx zero = CONST0_RTX (reg_raw_mode[regno]); >> rtx_insn *insn = emit_insn (gen_rtx_SET (regno_reg_rtx[regno], zero)); >> if (!valid_insn_p (insn)) >> { >> delete_insns_since (last_insn); >> ...remove regno from the set of cleared registers...; >> } >> >> where valid_insn_p abstracts out this code from ira.c: >> >> recog_memoized (move_insn); >> if (INSN_CODE (move_insn) < 0) >> continue; >> extract_insn (move_insn); >> /* We don't know whether the move will be in code that is optimized >> for size or speed, so consider all enabled alternatives. */ >> if (! constrain_operands (1, get_enabled_alternatives (move_insn))) >> continue; >> >> (but keeping the comment where it is). The default behaviour would then >> be to drop any register that can't be zeroed easily. > > I will check whether the above fix the ICE on rs6000. >> >> Doing this would make the default hook usable for more targets. >> The question is whether dropping registers that can't be zeroed >> easily is acceptable as a default policy for a security feature. > > If a “move_insn” is Not a valid insn per the above checking, can that “hard > register” still be zeroed? Sure. Like I said above, Power without VSX can zero FPRs by loading a zero from the constant pool. It just isn't a single instruction. >> >> I'm still a bit unsure what criteria are being applied to decide whether >> a register is worth zeroing or not, given that you were also talking about >> dropping mm0-7 and k0-7. > > Should leaving such decision to targets make more sense? The target gets the final say with both approaches. That's the point of making it a target hook. :-) But I don't think that really addresses my point. What I mean is: what are the criteria *in general* for deciding whether a particular register is worth clearing or not? And how do those criteria lead to the choices that you mentioned for x86? I think this is something that would be good to write down, so that targets can take a consistent approach. It would also increase the chances that clang and GCC are compatible, if clang ends up supporting the same feature. I was bitten by this with the stack-smash detection (-fstack-protector). GCC took the attitude that it was necessary to zero out the register that contains the canary value. LLVM took the attitude that it wasn't. I think both attitudes were “obvious” to the people that made them, but because the reasons weren't written down, it was hard to decide what to do when the two compilers took opposite positions. I thnk we're in danger of having the same problem here, either between compilers or between targets of the same compiler. If we have common, written-down ground rules for deciding which kinds of register should be included and which shouldn't, it will be easier to apply those rules to a given target. And it will be easier to decide on the right behaviour for the default implementation of the hook. It feels like the pushback against defining a default implementation of the hook is also a pushback against being specific about how targets are supposed to select which kinds of register need to be zeroed. Thanks, Richard