On Tue, Jul 25, 2017 at 11:25:25AM +0200, Robin Dapp wrote:
> Some questions/remarks, comments welcome:
>  - ifcvt performs creates things that it expects other passes to clean
> up afterwards.  In the case of the superfluous compares this might be
> possible but the code is actually wrong and we cannot rely on a pass
> fixing wrong code.

Do you have an example where wrong code is generated through the
noce_convert_multiple_sets_p path (with or without bodged costs)?

Both AArch64 and x86-64 reject your testcase along this codepath because
of the constant set of 1. If we work around that by setting bla = n rather
than bla = 1 , I see this code generation for AArch64:

  (insn 16 15 56 4 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 73 [ _4 ])
            (reg/v:SI 77 [ <retval> ]))) "foo.c":7 393 {cmpsi}
     (nil))
  (insn 56 16 57 4 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 73 [ _4 ])
            (reg/v:SI 77 [ <retval> ]))) "foo.c":10 393 {cmpsi}
     (nil))
  (insn 57 56 58 4 (set (reg:SI 81)
        (if_then_else:SI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg/v:SI 74 [ blaD.3132 ])
            (reg/v:SI 79 [ nD.3128 ]))) "foo.c":10 445 {*cmovsi_insn}
     (nil))
  (insn 58 57 59 4 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 73 [ _4 ])
            (reg/v:SI 77 [ <retval> ]))) "foo.c":10 393 {cmpsi}
     (nil))
  (insn 59 58 60 4 (set (reg:SI 82)
        (if_then_else:SI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg/v:SI 77 [ <retval> ])
            (reg:SI 73 [ _4 ]))) "foo.c":10 445 {*cmovsi_insn}
     (nil))

i.e. we have fresh registers. As you note, we do expect a later pass to
clean up the redundant compare of (reg:SI 73) (reg:SI 77), which can
throw the cost calculation off.

If I hack up ix86_noce_conversion_profitable_p to alwyas return true, then
for your testcase I also see:

.L4:
        movl    (%rdi), %ecx
        cmpl    %eax, %ecx
        cmovl   %esi, %edx
        cmovl   %ecx, %eax
        addq    $4, %rdi
        cmpq    %r8, %rdi
        jne     .L4

Again, no overlap.

The path through cond_move_process_if_block is less interesting to me,
as you're only getting incorrect code after disabling a correctness check.

What am I missing in the noce_convert_multiple_sets_p case?

>  - The extraction of the compare from the conditional move is pretty
> ad-hoc and pattern-dependent currently.  Is there a way to do it in a
> more backend-independent fashion?
>  - Branch mispredict costs don't seem to play a major role in ifcvt.
> Shouldn't they be accounted for in all cases, maybe weighted by bb
> probabilities?

They are accounted for through if_info->max_seq_cost, which is set as so:

  if_info.max_seq_cost = targetm.max_noce_ifcvt_seq_cost (then_edge);

Where max_noce_ifcvt_seq_cost is normally the default implementation, which
looks like:

  unsigned int
  default_max_noce_ifcvt_seq_cost (edge e)
  {
    bool predictable_p = predictable_edge_p (e);

    enum compiler_param param
      = (predictable_p
         ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST
         : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST);

    /* If we have a parameter set, use that, otherwise take a guess using
       BRANCH_COST.  */
    if (global_options_set.x_param_values[param])
      return PARAM_VALUE (param);
    else
      return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (3);
  }

The i386 version looks a lot like this but with a lower COSTS_N_INSNS magic
number.

Thanks,
James

Reply via email to