https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614

--- Comment #10 from kugan at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #9)
> > > as mentioned by Andrew, it is important to clone and also resolve indirect
> > > calls. Those auto-FDO 0 may prevent it from happening.
> > > It is easy to see in perf profile if the functions are cloned.
> > > 
> > > My overall plan is to combine autofdo with guessed profile, when autofdo
> > > samples are missing (i.e. we have 0 at input).  There is no 100% correct 
> > > way
> > > to do so, that is why I am trying to first get benchmarking set up and 
> > > kind
> > > of working only then start tampering with the profile generation.
> > 
> > Thanks for the information. I tried re-creating the same configuration and 
> > the
> > results unfortunately is the same. I will look at the dumps further.
> 
> I will look if I can preproduce that ICE with to_sreal. It means that
> the counts are not compatible, but it is not clear from the backtrace
> why.
> 
> Note that I think the main problem is that the code producing BB profile
> does make autofdo0 counts for all BBs where it can not sucesfully
> propagate to which means that they are not optimized for performance
> later. For example, it tends to prevent unrolling.  If loop was unroled
> in train run, we will not have enough info to determine its iteration
> count which and we may leave count 0 in the header of the loop.
> 
> We need to fill in the data from static profile and indicate that in the
> count->quality () (i.e. GUESSED versus AFDO).
> 
> I looked into the propagation algorithm yesterday and made it to
> propagate even if the info is not complete.  This is not quite correct
> solution, but mitigates the problem and reduces the performance gap from
> 4% to 2% in my SPEC runs.
> 
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index e12b3048f20..9ebd1a203fe 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1307,21 +1307,47 @@ afdo_propagate_edge (bool is_succ, bb_set
> *annotated_bb)
>         total_known_count += AFDO_EINFO (e)->get_count ();
>       num_edge++;
>        }
> +    if (dump_file)
> +      {
> +     fprintf (dump_file, "bb %i annotated %i dir %s edges %i, "
> +              "unknown edges %i, known count ",
> +              bb->index, is_bb_annotated (bb, *annotated_bb),
> +              is_succ ? "succesors" : "predecessors", num_edge, 
> num_unknown_edge);
> +     total_known_count.dump (dump_file);
> +     fprintf (dump_file, " bb count ");
> +     bb->count.dump (dump_file);
> +     fprintf (dump_file, "\n");
> +      }
> +    if (total_known_count > bb->count)
> +      {
> +     if (dump_file)
> +       {
> +         fprintf (dump_file, "  Updating count of bb %i ", bb->index);
> +         bb->count.dump (dump_file);
> +         fprintf (dump_file, " -> ");
> +         total_known_count.dump (dump_file);
> +         fprintf (dump_file, "\n");
> +       }
> +     bb->count = total_known_count;
> +     changed = true;
> +      }
>  
>      /* Be careful not to annotate block with no successor in special cases.
> */
> -    if (num_unknown_edge == 0 && total_known_count > bb->count)
> +    if (num_unknown_edge == 0 && num_edge
> +     && !is_bb_annotated (bb, *annotated_bb))
>        {
> -     bb->count = total_known_count;
> -     if (!is_bb_annotated (bb, *annotated_bb))
> -       set_bb_annotated (bb, annotated_bb);
> +     if (dump_file)
> +       fprintf (dump_file, "  Setting bb %i annotated\n", bb->index);
> +     set_bb_annotated (bb, annotated_bb);
>       changed = true;
>        }
>      else if (num_unknown_edge == 1 && is_bb_annotated (bb, *annotated_bb))
>        {
>       if (bb->count > total_known_count)
>         {
> -           profile_count new_count = bb->count - total_known_count;
> -           AFDO_EINFO(unknown_edge)->set_count(new_count);
> +         profile_count new_count = bb->count - total_known_count;
> +         AFDO_EINFO(unknown_edge)->set_count(new_count);
> +#if 0
>             if (num_edge == 1)
>               {
>                 basic_block succ_or_pred_bb = is_succ ? unknown_edge->dest :
> unknown_edge->src;
> @@ -1332,12 +1358,41 @@ afdo_propagate_edge (bool is_succ, bb_set
> *annotated_bb)
>                       set_bb_annotated (succ_or_pred_bb, annotated_bb);
>                   }
>               }
> +#endif
>          }
>       else
>         AFDO_EINFO (unknown_edge)->set_count (profile_count::zero().afdo ());
> +     if (dump_file)
> +       {
> +         fprintf (dump_file, "  Annotated edge %i->%i with count ",
> +                  unknown_edge->src->index, unknown_edge->dest->index);
> +         AFDO_EINFO (unknown_edge)->get_count ().dump (dump_file);
> +         fprintf (dump_file, "\n");
> +       }
>       AFDO_EINFO (unknown_edge)->set_annotated ();
>       changed = true;
>        }
> +    else if (total_known_count >= bb->count
> +          && num_unknown_edge > 1
> +          && is_bb_annotated (bb, *annotated_bb))
> +      {
> +     FOR_EACH_EDGE (e, ei, is_succ ? bb->succs : bb->preds)
> +       {
> +         gcc_assert (AFDO_EINFO (e) != NULL);
> +         if (! AFDO_EINFO (e)->is_annotated ())
> +           {
> +             AFDO_EINFO(e)->set_count (profile_count::zero().afdo ());
> +             AFDO_EINFO (e)->set_annotated ();
> +             if (dump_file)
> +               {
> +                 fprintf (dump_file, "  Annotated edge %i->%i with count ",
> +                          e->src->index, e->dest->index);
> +                 AFDO_EINFO (unknown_edge)->get_count ().dump (dump_file);
> +                 fprintf (dump_file, "\n");
> +               }
> +           }
> +       }
> +      }
>    }
>    return changed;
>  }
> @@ -1471,6 +1526,8 @@ afdo_propagate (bb_set *annotated_bb)
>          changed = true;
>        afdo_propagate_circuit (*annotated_bb);
>      }
> +  if (changed && dump_file)
> +    fprintf (dump_file, "Limit of 10 iterations reached\n");
>  }
>  
>  /* Propagate counts on control flow graph and calculate branch

This is helping with the performance. However, I still have the ICE reported
above. 

#2  0x0000000001917858 in profile_count::to_sreal_scale (this=0xffffefcd0958,
in=..., known=0x0) at ../../gcc/gcc/profile-count.cc:342
342       gcc_checking_assert (compatible_p (in));
(gdb) p *this
$4 = {static n_bits = 61, static max_count = 2305843009213693950, static
uninitialized_count = 2305843009213693951, m_val = 1073741820, m_quality =
GUESSED_LOCAL}
(gdb) p in
$5 = {static n_bits = 61, static max_count = 2305843009213693950, static
uninitialized_count = 2305843009213693951, m_val = 1230, m_quality = AFDO}

This is done with SPE and this can have low samples compared LBR. However, I
think this can still happen.

Reply via email to