Quoting Richard Sandiford <rdsandif...@googlemail.com>:

Joern Rennecke <joern.renne...@embecosm.com> writes:
2012-10-04  Joern Rennecke  <joern.renne...@embecosm.com>

         * final.c (get_attr_length_1): Use direct recursion rather than
         calling get_attr_length.
         (get_attr_lock_length): New function.
         (INSN_VARIABLE_LENGTH_P): Define.
         (shorten_branches): Take HAVE_ATTR_lock_length into account.
         Don't overwrite non-delay slot insn lengths with the lengths of
         delay slot insns with same uid.
         * genattrtab.c (lock_length_str): New variable.
         (make_length_attrs): New parameter base.
         (main): Initialize lock_length_str.
         Generate lock_lengths attributes.
         * genattr.c (gen_attr): Emit declarations for lock_length attribute
        related functions.
        * doc/md.texi (node Insn Lengths): Document lock_length attribute.

http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00383.html

Sorry, this is really just repeating Richard B's comments, but I still
don't understand why we need two attributes.  Why can't shorten_branches
work with the existing length and simply make sure that the length doesn't
decrease from one iteration to the next?  That seems to be how you implement
CASE_VECTOR_SHORTEN_MODE.  It also means that we can continue to use the
pessimistic algorithm for -O0.

That does not really work for ARCompact, non-branch instructions are
lengthened to align or un-align branch instructions.  Failure to
correct the size of such instruction downwards when indicated messes
not only up the directly affected branch, but also gets the alignment
wrong downstream, and can even cause branches go out of range (blindsiding branch shortening) when there are interactions with explicit alignments
for things like loops.
Unless you think it's OK to poke the contents of the insn_length array from
ADJUST_INSN_LENGTH.  That should work, but it'd require an extra parameter
when you want to hookize this macro.

You said in your reply that one of the reasons was to avoid
"interesting" interactions with ADJUST_INSN_LENGTH.  But the
previous minimum length would be applied after ADJUST_INSN_LENGTH,
so I'm not sure why it's a factor.

Well, for starters, the ARCompact ADJUST_INSN_LENGTH uses
get_attr_lock_length when processing delayed-branch SEQUENCEs.
And then there's the short/long instruction opcode distinction (function /
attribute verify_short), which depends on alignment, and influences
instruction length and conditional execution calculation.
Without exact alignment information, branch scheduling becomes a crapshot.

You might think I could subsume the length effect of the upsized instructions
in the affected branch, but that doesn't work because the exact length is
needed both for ADJUST_INSN_LENGTH, and in order to output the correct
branch / jump output template.


If lock_length is just an optimisation on top of that, then maybe
it would help to split it out.

Well, to the extent that branch shortening and scheduling are just
optimizations, having the lock_length attribute is just an optimization.

Well, we could split it anyway, and give ports without the need for
multiple length attributes the benefit of the optimistic algorithm.

I have attached a patch that implements this.

I first meant to test it for sh-elf, alas, PR54938 currently makes this
impossible.
So I used arm-eabi instead.  regression tests look OK, but libgcc.a and
libc.a (newlib) are unchanged in size, while libdtfc++.a increaes by twelve
bytes; more specifically, the size increase happens for the object file
debug.o, which goes from 11028 to 11040 bytes.
Likewise, cris-elf and mipsel-elf show a small increase in size of debug.o,
with the rest of libstdc++.a unchanged in size.

Generating and comparing the arm intermediate files debug.s shows a suspicious
incidence of pathnames.  Using a longer build directory pathname makes no
difference, though.
OTOH, using a longer source directory pathname does.  So that leaves...
makes no difference for building these libraries.
It should stop endless looping in shorten_branches, though.  Albeit that
is only releant for ports that do have some negative shorten_branch feedback.
2012-10-16  J"orn Rennecke  <joern.renne...@arc.com>

        * final.c (shorten_branches): When optimizing, start with small
        length and increase from there, and don't decrease lengths.
        Don't overwrite non-delay slot insn lengths with the lengths of
        delay slot insns with same uid.

Index: final.c
===================================================================
--- final.c     (revision 192491)
+++ final.c     (working copy)
@@ -1067,6 +1067,8 @@ shorten_branches (rtx first ATTRIBUTE_UN
 #endif /* CASE_VECTOR_SHORTEN_MODE */
 
   /* Compute initial lengths, addresses, and varying flags for each insn.  */
+  int (*length_fun) (rtx) = optimize ? insn_min_length : insn_default_length;
+
   for (insn_current_address = 0, insn = first;
        insn != 0;
        insn_current_address += insn_lengths[uid], insn = NEXT_INSN (insn))
@@ -1117,6 +1119,8 @@ shorten_branches (rtx first ATTRIBUTE_UN
 #else
          const_delay_slots = 0;
 #endif
+         int (*inner_length_fun) (rtx)
+           = const_delay_slots ? length_fun : insn_default_length;
          /* Inside a delay slot sequence, we do not do any branch shortening
             if the shortening could change the number of delay slots
             of the branch.  */
@@ -1131,7 +1135,7 @@ shorten_branches (rtx first ATTRIBUTE_UN
                inner_length = (asm_insn_count (PATTERN (inner_insn))
                                * insn_default_length (inner_insn));
              else
-               inner_length = insn_default_length (inner_insn);
+               inner_length = inner_length_fun (inner_insn);
 
              insn_lengths[inner_uid] = inner_length;
              if (const_delay_slots)
@@ -1149,7 +1153,7 @@ shorten_branches (rtx first ATTRIBUTE_UN
        }
       else if (GET_CODE (body) != USE && GET_CODE (body) != CLOBBER)
        {
-         insn_lengths[uid] = insn_default_length (insn);
+         insn_lengths[uid] = length_fun (insn);
          varying_length[uid] = insn_variable_length_p (insn);
        }
 
@@ -1165,6 +1169,7 @@ shorten_branches (rtx first ATTRIBUTE_UN
      get the current insn length.  If it has changed, reflect the change.
      When nothing changes for a full pass, we are done.  */
 
+  bool first_pass = true;
   while (something_changed)
     {
       something_changed = 0;
@@ -1220,6 +1225,7 @@ shorten_branches (rtx first ATTRIBUTE_UN
              rtx prev;
              int rel_align = 0;
              addr_diff_vec_flags flags;
+             enum machine_mode vec_mode;
 
              /* Avoid automatic aggregate initialization.  */
              flags = ADDR_DIFF_VEC_FLAGS (body);
@@ -1298,9 +1304,12 @@ shorten_branches (rtx first ATTRIBUTE_UN
                  else
                    max_addr += align_fuzz (max_lab, rel_lab, 0, 0);
                }
-             PUT_MODE (body, CASE_VECTOR_SHORTEN_MODE (min_addr - rel_addr,
-                                                       max_addr - rel_addr,
-                                                       body));
+             vec_mode = CASE_VECTOR_SHORTEN_MODE (min_addr - rel_addr,
+                                                  max_addr - rel_addr, body);
+             if (first_pass
+                 || (GET_MODE_SIZE (vec_mode)
+                     >= GET_MODE_SIZE (GET_MODE (body))))
+               PUT_MODE (body, vec_mode);
              if (JUMP_TABLES_IN_TEXT_SECTION
                  || readonly_data_section == text_section)
                {
@@ -1360,12 +1369,20 @@ shorten_branches (rtx first ATTRIBUTE_UN
                  else
                    inner_length = insn_current_length (inner_insn);
 
-                 if (inner_length != insn_lengths[inner_uid])
+                 /* We can't record lengths of delay slot insns, because
+                    they might just be a copy of the insn at the branch
+                    target, sharing its uid.  */
+                 if (i == 0 && inner_length != insn_lengths[inner_uid])
                    {
-                     insn_lengths[inner_uid] = inner_length;
-                     something_changed = 1;
+                     if (inner_length > insn_lengths[inner_uid] || first_pass)
+                       {
+                         insn_lengths[inner_uid] = inner_length;
+                         something_changed = 1;
+                       }
+                     else
+                       inner_length = insn_lengths[inner_uid];
                    }
-                 insn_current_address += insn_lengths[inner_uid];
+                 insn_current_address += inner_length;
                  new_length += inner_length;
                }
            }
@@ -1382,7 +1399,8 @@ shorten_branches (rtx first ATTRIBUTE_UN
          insn_current_address += (new_length - tmp_length);
 #endif
 
-         if (new_length != insn_lengths[uid])
+         if (new_length != insn_lengths[uid]
+             && (new_length > insn_lengths[uid] || first_pass))
            {
              insn_lengths[uid] = new_length;
              something_changed = 1;
@@ -1391,6 +1409,7 @@ shorten_branches (rtx first ATTRIBUTE_UN
       /* For a non-optimizing compile, do only a single pass.  */
       if (!optimize)
        break;
+      first_pass = false;
     }
 
   free (varying_length);

Reply via email to