On Fri, Jan 14, 2022 at 7:11 AM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote: > > > > No, the approach is wrong. You have to solve output clearing on RTL > > > level, please look at how e.g. tzcnt false dep is solved: > > > > Actually we have considered such approach before, but we found we need > > to break original define_insn to remove the mask/rounding subst, > > since define_split could not adopt subst, and that would add 6 more > > define_insn_and_split and 4 define_insn for each instruction. We think > > such approach would introduce too much redundant code. > > > > Do you think the code size increment is acceptable? > > Also that 100+ more patterns increases maintenance effort. If we split > them at epilogue_complete stage, > it seems not much difference to put it under output template...
In the proposed patch, if the output register is also mentioned in the input, then clearing before insn will clear the value in the input register. The solution in the i386.md also takes care of this issue. Uros.