On Thu, Jan 7, 2021 at 10:14 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Thu, Jan 7, 2021 at 9:56 AM Jakub Jelinek <ja...@redhat.com> wrote: > > > > Hi! > > > > The BLSI instruction sets SF and ZF based on the result and clears OF. > > CF is set to something unrelated. > > > > The following patch optimizes BLSI followed by comparison, so we don't need > > to emit a TEST insn in between. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > 2021-01-07 Jakub Jelinek <ja...@redhat.com> > > > > PR target/98567 > > * config/i386/i386.md (*bmi_blsi_<mode>_cmp): New define_insn. > > > > * gcc.target/i386/pr98567-1.c: New test. > > * gcc.target/i386/pr98567-2.c: New test. > > > > --- gcc/config/i386/i386.md.jj 2021-01-04 10:25:45.072163178 +0100 > > +++ gcc/config/i386/i386.md 2021-01-06 17:49:13.251966127 +0100 > > @@ -14568,6 +14568,21 @@ (define_insn "*bmi_blsi_<mode>" > > (set_attr "btver2_decode" "double") > > (set_attr "mode" "<MODE>")]) > > > > +(define_insn "*bmi_blsi_<mode>_cmp" > > + [(set (reg FLAGS_REG) > > + (compare > > + (and:SWI48 > > + (neg:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")) > > + (match_dup 1)) > > + (const_int 0))) > > + (set (match_operand:SWI48 0 "register_operand" "=r") > > + (and:SWI48 (neg:SWI48 (match_dup 1)) (match_dup 1)))] > > + "TARGET_BMI && ix86_match_ccmode (insn, CCNOmode)" > > + "blsi\t{%1, %0|%0, %1}" > > + [(set_attr "type" "bitmanip") > > + (set_attr "btver2_decode" "double") > > + (set_attr "mode" "<MODE>")]) > > I wonder if we should also add _cc variant where scratch is used:
Er, _ccno in this particular case. > (define_insn "*bmi_blsi_<mode>_cc" > [(set (reg FLAGS_REG) > (compare > (and:SWI48 > (neg:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")) > (match_dup 1)) > (const_int 0))) > (clobber (match_scratch:SWI48 0 "=r"))] > "TARGET_BMI && ix86_match_ccmode (insn, CCNOmode)" > "blsi\t{%1, %0|%0, %1}" > [(set_attr "type" "bitmanip") > (set_attr "btver2_decode" "double") > (set_attr "mode" "<MODE>")]) > > The output is unused in the testcases, so there may be no difference > in the generated code, but it looks to me that additional pattern > gives the compiler more freedom. Also note, that all other CC setting > insns come in three variants. If I'd have to choose between _cmp and _ccno variants, I'd prefer __ccno one (with the scratch), since output and flags are seldom used together (and when they do, a new pack of optimization problems arises, e.g. recently mentioned global RTX CSE issue). Uros.