Bernd Schmidt <ber...@codesourcery.com> writes:
> This adds the actual optimization, and reworks the JUMP_LABEL handling
> for return blocks. See the introduction mail or the new comment ahead of
> thread_prologue_and_epilogue_insns for more notes.

It seems a shame to have both (return) and (simple_return).  You said
that we need the distinction in order to cope with targets like ARM,
whose (return) instruction actually performs some of the epilogue too.
It feels like the load of the saved registers should really be expressed
in rtl, in parallel with the return.  I realise that'd prevent
conditional returns though.  Maybe there's no elegant way out...

With the hidden loads, it seems like we'll have a situation in which the
values of call-saved registers will appear to be different for different
"real" incoming edges to the exit block.

Is JUMP_LABEL ever null after this change?  (In fully-complete rtl
sequences, I mean.)  It looked like some of the null checks in the
patch might not be necessary any more.

JUMP_LABEL also seems somewhat misnamed after this change; maybe
JUMP_TARGET would be better?  I'm the last person who should be
recommending names though.

I know it's a pain, but it'd really help if you could split the
"JUMP_LABEL == a return rtx" stuff out.

I think it'd also be worth splitting the RETURN_ADDR_REGNUM bit out into
a separate patch, and handling other things in a more generic way.
E.g. the default INCOMING_RETURN_ADDR_RTX could then be:

  #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM)

and df.c:df_get_exit_block_use_set should include RETURN_ADDR_REGNUM
when epilogue_completed.

It'd be nice to handle cases in which all references to the stack pointer
are to the incoming arguments.  Maybe mention the fact that we don't as
another source of conservatism?

It'd also be nice to get rid of all these big blocks of code that are
conditional on preprocessor macros, but I realise you're just following
existing practice in the surrounding code, so again it can be left to
a future cleanup.

> @@ -1280,7 +1297,7 @@ force_nonfallthru_and_redirect (edge e, 
>  basic_block
>  force_nonfallthru (edge e)
>  {
> -  return force_nonfallthru_and_redirect (e, e->dest);
> +  return force_nonfallthru_and_redirect (e, e->dest, NULL_RTX);
>  }

Maybe assert here that e->dest isn't the exit block?  I realise it
will be caught by the:

    gcc_assert (jump_label == simple_return_rtx);

check, but an assert here would make it more obvious what had gone wrong.

> -  if (GET_CODE (x) == RETURN)
> +  if (GET_CODE (x) == RETURN || GET_CODE (x) == SIMPLE_RETURN)

ANY_RETURN_P (x).  A few other cases.

> @@ -5654,6 +5658,7 @@ init_emit_regs (void)
>    /* Assign register numbers to the globally defined register rtx.  */
>    pc_rtx = gen_rtx_fmt_ (PC, VOIDmode);
>    ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode);
> +  simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
>    cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
>    stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM);
>    frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM);

It'd be nice to s/ret_rtx/return_rtx/ for consistency, but that can
happen anytime.

> +/* Return true if INSN requires the stack frame to be set up.  */
> +static bool
> +requires_stack_frame_p (rtx insn)
> +{
> +  HARD_REG_SET hardregs;
> +  unsigned regno;
> +
> +  if (!INSN_P (insn) || DEBUG_INSN_P (insn))
> +    return false;
> +  if (CALL_P (insn))
> +    return !SIBLING_CALL_P (insn);
> +  if (for_each_rtx (&PATTERN (insn), frame_required_for_rtx, NULL))
> +    return true;
> +  CLEAR_HARD_REG_SET (hardregs);
> +  note_stores (PATTERN (insn), record_hard_reg_sets, &hardregs);
> +  AND_COMPL_HARD_REG_SET (hardregs, call_used_reg_set);
> +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (TEST_HARD_REG_BIT (hardregs, regno)
> +     && df_regs_ever_live_p (regno))
> +      return true;

This can be done as a follow-up, but it looks like df should be using
a HARD_REG_SET here, and that we should be able to get at it directly.

> +       FOR_EACH_EDGE (e, ei, bb->preds)
> +         if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
> +           {
> +             VEC_quick_push (basic_block, vec, e->src);
> +             bitmap_set_bit (&bb_on_list, e->src->index);
> +           }

&& !bitmap_bit_p (&bb_on_list, e->src->index) ?

> +     }
> +      while (!VEC_empty (basic_block, vec))
> +     {
> +       basic_block tmp_bb = VEC_pop (basic_block, vec);
> +       edge e;
> +       edge_iterator ei;
> +       bool all_set = true;
> +
> +       bitmap_clear_bit (&bb_on_list, tmp_bb->index);
> +       FOR_EACH_EDGE (e, ei, tmp_bb->succs)
> +         {
> +           if (!bitmap_bit_p (&bb_antic_flags, e->dest->index))
> +             {
> +               all_set = false;
> +               break;
> +             }
> +         }
> +       if (all_set)
> +         {
> +           bitmap_set_bit (&bb_antic_flags, tmp_bb->index);
> +           FOR_EACH_EDGE (e, ei, tmp_bb->preds)
> +             if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
> +               {
> +                 VEC_quick_push (basic_block, vec, e->src);
> +                 bitmap_set_bit (&bb_on_list, e->src->index);
> +               }

same here.

> +         }
> +     }
> +      /* Find exactly one edge that leads to a block in ANTIC from
> +      a block that isn't.  */
> +      if (!bitmap_bit_p (&bb_antic_flags, entry_edge->dest->index))
> +     FOR_EACH_BB (bb)
> +       {
> +         if (!bitmap_bit_p (&bb_antic_flags, bb->index))
> +           continue;
> +         FOR_EACH_EDGE (e, ei, bb->preds)
> +           if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
> +             {
> +               if (entry_edge != orig_entry_edge)
> +                 {
> +                   entry_edge = orig_entry_edge;
> +                   goto fail_shrinkwrap;
> +                 }
> +               entry_edge = e;
> +             }
> +       }

AIUI, this prevents the optimisation for things like

  if (a) {
    switch (b) {
      case 1:
        ...stuff that requires a frame...
        break;
      case 2:
        ...stuff that requires a frame...
        break;
      default:
        ...stuff that doesn't require a frame...
        break;
    }
  }

The switch won't be in ANTIC, but it will have two successors that are.
Is that right?

Would it work to do something like:

      FOR_EACH_BB (bb)
        {
          rtx insn;
          FOR_BB_INSNS (bb, insn)
            if (requires_stack_frame_p (insn))
              {
                bitmap_set_bit (&bb_flags, bb->index);
                break;
              }
        }
      if (bitmap_empty_p (bb_flags))
        ...no frame needed...
      else
        {
          calculate_dominance_info (CDI_DOMINATORS);
          bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bb_flags);
          if (bb == ENTRY_BLOCK_PTR)
            ...bleh...
          else
            ...insert prologue at the beginning of bb...
        }

?  Or (for a different trade-off) just use nearest_common_dominator
directly, and avoid the bitmap.

> @@ -5515,25 +5841,38 @@ thread_prologue_and_epilogue_insns (void
>        set_insn_locators (seq, epilogue_locator);
>  
>        seq = get_insns ();
> +      returnjump = get_last_insn ();
>        end_sequence ();
>  
> -      insert_insn_on_edge (seq, e);
> +      insert_insn_on_edge (seq, exit_fallthru_edge);
>        inserted = true;
> +      if (JUMP_P (returnjump))
> +     {
> +       rtx pat = PATTERN (returnjump);
> +       if (GET_CODE (pat) == PARALLEL)
> +         pat = XVECEXP (pat, 0, 0);
> +       if (ANY_RETURN_P (pat))
> +         JUMP_LABEL (returnjump) = pat;
> +       else
> +         JUMP_LABEL (returnjump) = ret_rtx;
> +     }
> +      else
> +     returnjump = NULL_RTX;

Does the "JUMP_LABEL (returnjump) = ret_rtx;" handle targets that
use things like (set (pc) (reg RA)) as their return?  Probably worth
adding a comment if so.

I didn't review much after this, because it was hard to sort the
simple_return stuff out from the "JUMP_LABEL can be a return rtx" change.

Richard

Reply via email to