9] -- continuation

Richard Sandiford Wed, 03 Oct 2012 04:12:22 -0700

Hi Vlad,

Some comments on lra-spills.c and lra-coalesce.c.


> +   The pass creates necessary stack slots and assign spilled pseudos
> +   to the stack slots in following way:

s/assign/assigns/

> +   (or insn memory constraints) might be not satisfied any more.

s/might be not/might not be/

> +   For some targets, the pass can spill some pseudos into hard
> +   registers of different class (usually into vector registers)
> +   instead of spilling them into memory if it is possible and
> +   profitable.       Spilling GENERAL_REGS pseudo into SSE registers for
> +   modern Intel x86/x86-64 processors is an example of such
> +   optimization.  And this is actually recommended by Intel
> +   optimization guide.

Maybe mention core i7 specifically?  "Modern" is a bit dangerous
in code that'll live a long time.

> +/* The structure describes a stack slot which can be used for several
> +   spilled pseudos.  */
> +struct slot
> +{

Looks like this describes "a register or stack slot" given the hard_regno case.

> +/* Array containing info about the stack slots.       The array element is
> +   indexed by the stack slot number in the range [0..slost_num).  */

Typo: slots_num

> +  /* Each pseudo has an inherent size which comes from its own mode,
> +     and a total size which provides room for paradoxical subregs
> +     which refer to the pseudo reg in wider modes.
> +     
> +     We can use a slot already allocated if it provides both enough
> +     inherent space and enough total space.  Otherwise, we allocate a
> +     new slot, making sure that it has no less inherent space, and no
> +     less total space, then the previous slot.       */

The second part of the comment seems a bit misplaced, since the following
code doesn't reuse stack slots.  This is done elsewhere instead.
Maybe the first part would be better above the inherent_size assignment.

> +  /* If we have any adjustment to make, or if the stack slot is the
> +     wrong mode, make a new stack slot.       */
> +  x = adjust_address_nv (x, GET_MODE (regno_reg_rtx[i]), adjust);

We don't make a new slot here.

> +/* Sort pseudos according their slot numbers putting ones with smaller
> +   numbers first, or last when the frame pointer is not needed.       So
> +   pseudos with the first slot will be finally addressed with smaller
> +   address displacement.  */
> +static int
> +pseudo_reg_slot_compare (const void *v1p, const void *v2p)
> +{
> +  const int regno1 = *(const int *) v1p;
> +  const int regno2 = *(const int *) v2p;
> +  int diff, slot_num1, slot_num2;
> +  int total_size1, total_size2;
> +
> +  slot_num1 = pseudo_slots[regno1].slot_num;
> +  slot_num2 = pseudo_slots[regno2].slot_num;
> +  if ((diff = slot_num1 - slot_num2) != 0)
> +    return (frame_pointer_needed
> +         || !FRAME_GROWS_DOWNWARD == STACK_GROWS_DOWNWARD ? diff : -diff);

The comment doesn't quite describe the condition.  Maybe:

/* Sort pseudos according to their slots, putting the slots in the order
   that they should be allocated.  Slots with lower numbers have the highest
   priority and should get the smallest displacement from the stack or
   frame pointer (whichever is being used).

   The first allocated slot is always closest to the frame pointer,
   so prefer lower slot numbers when frame_pointer_needed.  If the stack
   and frame grow in the same direction, then the first allocated slot is
   always closest to the initial stack pointer and furthest away from the
   final stack pointer, so allocate higher numbers first when using the
   stack pointer in that case.  The reverse is true if the stack and
   frame grow in opposite directions.  */

> +  total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1),
> +                  GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode));
> +  total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2),
> +                  GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode));
> +  if ((diff = total_size2 - total_size1) != 0)
> +    return diff;

I think this could do with a bit more commentary.  When is biggest_mode
ever smaller than PSEUDO_REGNO_BYTES?  Is that for pseudos that are only
ever referenced as lowpart subregs?  If so, why does PSEUDO_REGNO_BYTES
matter for those registers here but not when calculating biggest_mode?

> +/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS.  Put the
> +   pseudos which did not get a spill hard register at the beginning of
> +   array PSEUDO_REGNOS.       Return the number of such pseudos.  */

It'd be worth saying that PSEUDO_REGNOS is sorted in order of highest
frequency first.

> +  bitmap set_jump_crosses = regstat_get_setjmp_crosses ();

I notice you use "set_jump" here and "setjump" in parts of 7a.patch.
Probably better to use setjmp across the board.

> +  /* Hard registers which can not be used for any purpose at given
> +     program point because they are unallocatable or already allocated
> +     for other pseudos.       */ 
> +  HARD_REG_SET *reserved_hard_regs;
> +
> +  if (! lra_reg_spill_p)
> +    return n;
> +  /* Set up reserved hard regs for every program point.       */
> +  reserved_hard_regs = (HARD_REG_SET *) xmalloc (sizeof (HARD_REG_SET)
> +                                              * lra_live_max_point);
> +  for (p = 0; p < lra_live_max_point; p++)
> +    COPY_HARD_REG_SET (reserved_hard_regs[p], lra_no_alloc_regs);
> +  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
> +    if (lra_reg_info[i].nrefs != 0
> +     && (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
> +      for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
> +     for (p = r->start; p <= r->finish; p++)
> +       lra_add_hard_reg_set (hard_regno, lra_reg_info[i].biggest_mode,
> +                             &reserved_hard_regs[p]);

Since compilation time seems to be all the rage, I wonder if it would be
quicker to have one live range list per hard register.  Then:

> +      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
> +     for (p = r->start; p <= r->finish; p++)
> +       IOR_HARD_REG_SET (conflict_hard_regs, reserved_hard_regs[p]);

would just be checking for live range intersection and:

> +      /* Update reserved_hard_regs.  */
> +      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
> +     for (p = r->start; p <= r->finish; p++)
> +       lra_add_hard_reg_set (hard_regno, lra_reg_info[regno].biggest_mode,
> +                             &reserved_hard_regs[p]);

would again be a merge.

Just an idea, not a merge requirement.  If you've already tried this and
found it to be worse, that might be worth a comment.

> +      first = pseudo_slots[regno].first = 
> &pseudo_slots[slots[slot_num].regno];
> +      pseudo_slots[regno].next = pseudo_slots[slots[slot_num].regno].next;
> +      first->next = &pseudo_slots[regno];

Very minor nit, but I think this would be easier to read if the middle
line also used "first->next".

> +/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS.  Put the
> +   pseudos which did not get a spill hard register at the beginning of
> +   array PSEUDO_REGNOS.       Return the number of such pseudos.  */

Here too I think it's worth mentioning that PSEUDO_REGNOS is sorted
with highest frequency first.

> +/* Recursively process LOC in INSN and change spilled pseudos to the
> +   corresponding memory or spilled hard reg.  Ignore spilled pseudos
> +   created from the scratches.       */
> +static bool
> +remove_pseudos (rtx *loc, rtx insn)

The return value is now ignored -- we know in advance which insns need
changing -- so this could be simplified.

> +/* Change spilled pseudos into memory or spill hard regs.  The
> +   function put changed insns on the constraint stack (these insns
> +   will be considered on the next constraint pass).  The changed insns
> +   are all insns in which pseudos were changed.       */

s/The function put/Put/

> +/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
> +   LR_BITMAP (a BB live info bitmap).  */
> +static void
> +update_live_info (bitmap lr_bitmap)
> +{
> +  unsigned int j;
> +  bitmap_iterator bi;
> +
> +  bitmap_clear (&removed_pseudos_bitmap);
> +  bitmap_clear (&used_pseudos_bitmap);
> +  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
> +                         FIRST_PSEUDO_REGISTER, j, bi)
> +    {
> +      bitmap_set_bit (&removed_pseudos_bitmap, j);
> +      bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
> +    }
> +  if (! bitmap_empty_p (&removed_pseudos_bitmap))
> +    {
> +      bitmap_and_compl_into (lr_bitmap, &removed_pseudos_bitmap);
> +      bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
> +    }
> +}

Might be wrong, but it looks like nothing really uses removed_pseudos_bitmap
outside this function.  I think this could simply be:

/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
   LR_BITMAP (a BB live info bitmap).  */
static void
update_live_info (bitmap lr_bitmap)
{
  unsigned int j;
  bitmap_iterator bi;

  bitmap_clear (&used_pseudos_bitmap);
  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
                            FIRST_PSEUDO_REGISTER, j, bi)
    bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
  if (! bitmap_empty_p (&used_pseudos_bitmap))
    {
      bitmap_and_compl_into (lr_bitmap, &coalesced_pseudos_bitmap);
      bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
    }
}

> +         && mem_move_p (sregno, dregno)
> +         /* Don't coalesce inheritance pseudos because spilled
> +            inheritance pseudos will be removed in subsequent 'undo
> +            inheritance' pass.  */
> +         && lra_reg_info[sregno].restore_regno < 0
> +         && lra_reg_info[dregno].restore_regno < 0
> +         /* We undo splits for spilled pseudos whose live ranges
> +            were split.  So don't coalesce them, it is not
> +            necessary and the undo transformations would be
> +            wrong.  */
> +         && ! bitmap_bit_p (&split_origin_bitmap, sregno)
> +         && ! bitmap_bit_p (&split_origin_bitmap, dregno)
> +         && ! side_effects_p (set)
> +         /* Don't coalesces bound pseudos.  Bound pseudos has own
> +            rules for finding live ranges.  It is hard to maintain
> +            this info with coalescing and it is not worth to do
> +            it.  */
> +         && ! bitmap_bit_p (&lra_bound_pseudos, sregno)
> +         && ! bitmap_bit_p (&lra_bound_pseudos, dregno)
> +         /* We don't want to coalesce regnos with equivalences,
> +            at least without updating this info.  */
> +         && ira_reg_equiv[sregno].constant == NULL_RTX
> +         && ira_reg_equiv[sregno].memory == NULL_RTX
> +         && ira_reg_equiv[sregno].invariant == NULL_RTX
> +         && ira_reg_equiv[dregno].constant == NULL_RTX
> +         && ira_reg_equiv[dregno].memory == NULL_RTX
> +         && ira_reg_equiv[dregno].invariant == NULL_RTX

Probably personal preference, but I think this would be easier
to read as:

            && coalescable_reg_p (sregno)
            && coalescable_reg_p (dregno)
            && !side_effects_p (set)

with coalescable_reg_p checking reg_renumber (from mem_move_p)
and the open-coded stuff in the quote above.

> +  for (; mv_num != 0;)
> +    {
> +      for (i = 0; i < mv_num; i++)
> +     {
> +       mv = sorted_moves[i];
> +       set = single_set (mv);
> +       lra_assert (set != NULL && REG_P (SET_SRC (set))
> +                   && REG_P (SET_DEST (set)));
> +       sregno = REGNO (SET_SRC (set));
> +       dregno = REGNO (SET_DEST (set));
> +       if (! lra_intersected_live_ranges_p
> +             (lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
> +              lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges))
> +         {
> +           coalesced_moves++;
> +           if (lra_dump_file != NULL)
> +             fprintf
> +               (lra_dump_file,
> +                "      Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
> +                INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
> +                dregno, ORIGINAL_REGNO (SET_DEST (set)),
> +                BLOCK_FOR_INSN (mv)->frequency);
> +           bitmap_ior_into (&involved_insns_bitmap,
> +                            &lra_reg_info[sregno].insn_bitmap);
> +           bitmap_ior_into (&involved_insns_bitmap,
> +                            &lra_reg_info[dregno].insn_bitmap);
> +           merge_pseudos (sregno, dregno);
> +           i++;
> +           break;
> +         }
> +     }
> +      /* Collect the rest of copies.  */
> +      for (n = 0; i < mv_num; i++)
> +     {
> +       mv = sorted_moves[i];
> +       set = single_set (mv);
> +       lra_assert (set != NULL && REG_P (SET_SRC (set))
> +                   && REG_P (SET_DEST (set)));
> +       sregno = REGNO (SET_SRC (set));
> +       dregno = REGNO (SET_DEST (set));
> +       if (first_coalesced_pseudo[sregno] != first_coalesced_pseudo[dregno])
> +         sorted_moves[n++] = mv;
> +       else if (lra_dump_file != NULL)
> +         {
> +           coalesced_moves++;
> +           fprintf
> +             (lra_dump_file, "      Coalescing move %i:r%d-r%d (freq=%d)\n",
> +              INSN_UID (mv), sregno, dregno,
> +              BLOCK_FOR_INSN (mv)->frequency);
> +         }
> +     }
> +      mv_num = n;

I'm probably being dense here, sorry, but why the nested loops?
Why can't we have one loop along the lines of:

      for (i = 0; i < mv_num; i++)
        {
          mv = sorted_moves[i];
          set = single_set (mv);
          lra_assert (set != NULL && REG_P (SET_SRC (set))
                      && REG_P (SET_DEST (set)));
          sregno = REGNO (SET_SRC (set));
          dregno = REGNO (SET_DEST (set));
          if (first_coalesced_pseudo[sregno] == first_coalesced_pseudo[dregno])
            {
              coalesced_moves++;
              fprintf
                (lra_dump_file, "      Coalescing move %i:r%d-r%d (freq=%d)\n",
                 INSN_UID (mv), sregno, dregno,
                 BLOCK_FOR_INSN (mv)->frequency);
              /* We updated involved_insns_bitmap when doing the mrege */
            }
          else if (!(lra_intersected_live_ranges_p
                     (lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
                      
lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges)))
            {
              coalesced_moves++;
              if (lra_dump_file != NULL)
                fprintf
                  (lra_dump_file,
                   "      Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
                   INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
                   dregno, ORIGINAL_REGNO (SET_DEST (set)),
                   BLOCK_FOR_INSN (mv)->frequency);
              bitmap_ior_into (&involved_insns_bitmap,
                               &lra_reg_info[sregno].insn_bitmap);
              bitmap_ior_into (&involved_insns_bitmap,
                               &lra_reg_info[dregno].insn_bitmap);
              merge_pseudos (sregno, dregno);
            }
        }

(completely untested)

> +         if ((set = single_set (insn)) != NULL_RTX
> +             && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))
> +             && REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))
> +             && ! side_effects_p (set))

Maybe use set_noop_p here?

Richard

Re: RFC: LRA for x86/x86-64 [7/9] -- continuation

Reply via email to