Jeff Law wrote:

I was looking at a regression caused by having ira-reload utilize the existing copy detection code in IRA rather than my own and stumbled upon this...

Consider this insn prior to IRA:

(insn 72 56 126 8 j.c:744 (parallel [
            (set (reg:SI 110)
                (minus:SI (reg:SI 69 [ ew_u$parts$lsw ])
                    (reg:SI 68 [ ew_u$parts$lsw ])))
            (clobber (reg:CC 17 flags))
]) 290 {*subsi_1} (expr_list:REG_DEAD (reg:SI 69 [ ew_u$parts$lsw ])
        (expr_list:REG_DEAD (reg:SI 68 [ ew_u$parts$lsw ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))

Which matches this pattern in the x86 backend:

(define_insn "*sub<mode>_1"
  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
        (minus:SWI
          (match_operand:SWI 1 "nonimmediate_operand" "0,0")
          (match_operand:SWI 2 "<general_operand>" "<r><i>,<r>m")))
   (clobber (reg:CC FLAGS_REG))]
  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
  "sub{<imodesuffix>}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "mode" "<MODE>")])


Note carefully that the constraints require operands 0 and 1 to match. Operand 2 is not tied to any other operand. In fact, if operand 2 is tied to operand0, then we are guaranteed to generate a reload and muck up the code pretty badly.


Now looking at the copies recorded by IRA we have:


  cp0:a0(r95)<->a1(r70)@248:move
  cp1:a4(r69)<->a6(r110)@178:constraint
  cp2:a3(r68)<->a6(r110)@22:shuffle
  cp3:a9(r66)<->a10(r92)@11:shuffle
  cp4:a11(r79)<->a12(r93)@89:move
  cp5:a1(r70)<->a14(r96)@114:constraint
  cp6:a1(r70)<->a13(r97)@114:constraint


Note carefully cp2 which claims a shuffle-copy between r68 and r110. ISTM that when trying to assign a hard reg to pseudo r110 that if r68 has a hard reg, but r69 does not, then pseudo r110 will show a cost savings if it is allocated into the same hard reg as pseudo r68.


The problematic code is add_insn_allocno_copies:
   {
      extract_insn (insn);
      for (i = 0; i < recog_data.n_operands; i++)
        {
          operand = recog_data.operand[i];
          if (REG_SUBREG_P (operand)
&& find_reg_note (insn, REG_DEAD,
                                REG_P (operand)
? operand : SUBREG_REG (operand)) != NULL_RTX)
            {
              str = recog_data.constraints[i];
              while (*str == ' ' || *str == '\t')
                str++;
              bound_p = false;
              for (j = 0, commut_p = false; j < 2; j++, commut_p = true)
                if ((dup = get_dup (i, commut_p)) != NULL_RTX
&& REG_SUBREG_P (dup)
&& process_regs_for_copy (operand, dup, true,
                                              NULL_RTX, freq))
                  bound_p = true;
              if (bound_p)
                continue;
              /* If an operand dies, prefer its hard register for the
                 output operands by decreasing the hard register cost
                 or creating the corresponding allocno copies.  The
                 cost will not correspond to a real move insn cost, so
                 make the frequency smaller.  */
              process_reg_shuffles (operand, i, freq < 8 ? 1 : freq / 8);
            }
        }


With r68 dying and not bound to an another operand, we create a reg-shuffle copy to encourage tying r68 to the output operand. Not good.

ISTM that if an output is already bound to some input that we should not be recording a copy between an unbound dying input and the bound output.

You can play with the attached (meaningless) testcase -O2 -m32 -fPIC. You won't see code quality regressions due to this problem on this testcase with the mainline sources, but it should be enough to trigger the bogus copy.

Thoughts?

Yes, Jeff, you are probably right that we should make the shuffle copy when there is already a constraint copy to the operand. There is no regression on the test because shuffle copy has always the smaller cost. I introduced the shuffle copies because they improved SPEC2000 rates for x86/x86_64 but I did not try to write a code to remove the situation you found.

I think we should try to remove shuffle copy (or even discorage such shuffle through negative copy cost) and see what happened to SPEC rates.

Reply via email to