https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #10 from kugan at gcc dot gnu.org --- (In reply to Jan Hubicka from comment #9) > > > as mentioned by Andrew, it is important to clone and also resolve indirect > > > calls. Those auto-FDO 0 may prevent it from happening. > > > It is easy to see in perf profile if the functions are cloned. > > > > > > My overall plan is to combine autofdo with guessed profile, when autofdo > > > samples are missing (i.e. we have 0 at input). There is no 100% correct > > > way > > > to do so, that is why I am trying to first get benchmarking set up and > > > kind > > > of working only then start tampering with the profile generation. > > > > Thanks for the information. I tried re-creating the same configuration and > > the > > results unfortunately is the same. I will look at the dumps further. > > I will look if I can preproduce that ICE with to_sreal. It means that > the counts are not compatible, but it is not clear from the backtrace > why. > > Note that I think the main problem is that the code producing BB profile > does make autofdo0 counts for all BBs where it can not sucesfully > propagate to which means that they are not optimized for performance > later. For example, it tends to prevent unrolling. If loop was unroled > in train run, we will not have enough info to determine its iteration > count which and we may leave count 0 in the header of the loop. > > We need to fill in the data from static profile and indicate that in the > count->quality () (i.e. GUESSED versus AFDO). > > I looked into the propagation algorithm yesterday and made it to > propagate even if the info is not complete. This is not quite correct > solution, but mitigates the problem and reduces the performance gap from > 4% to 2% in my SPEC runs. > > diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc > index e12b3048f20..9ebd1a203fe 100644 > --- a/gcc/auto-profile.cc > +++ b/gcc/auto-profile.cc > @@ -1307,21 +1307,47 @@ afdo_propagate_edge (bool is_succ, bb_set > *annotated_bb) > total_known_count += AFDO_EINFO (e)->get_count (); > num_edge++; > } > + if (dump_file) > + { > + fprintf (dump_file, "bb %i annotated %i dir %s edges %i, " > + "unknown edges %i, known count ", > + bb->index, is_bb_annotated (bb, *annotated_bb), > + is_succ ? "succesors" : "predecessors", num_edge, > num_unknown_edge); > + total_known_count.dump (dump_file); > + fprintf (dump_file, " bb count "); > + bb->count.dump (dump_file); > + fprintf (dump_file, "\n"); > + } > + if (total_known_count > bb->count) > + { > + if (dump_file) > + { > + fprintf (dump_file, " Updating count of bb %i ", bb->index); > + bb->count.dump (dump_file); > + fprintf (dump_file, " -> "); > + total_known_count.dump (dump_file); > + fprintf (dump_file, "\n"); > + } > + bb->count = total_known_count; > + changed = true; > + } > > /* Be careful not to annotate block with no successor in special cases. > */ > - if (num_unknown_edge == 0 && total_known_count > bb->count) > + if (num_unknown_edge == 0 && num_edge > + && !is_bb_annotated (bb, *annotated_bb)) > { > - bb->count = total_known_count; > - if (!is_bb_annotated (bb, *annotated_bb)) > - set_bb_annotated (bb, annotated_bb); > + if (dump_file) > + fprintf (dump_file, " Setting bb %i annotated\n", bb->index); > + set_bb_annotated (bb, annotated_bb); > changed = true; > } > else if (num_unknown_edge == 1 && is_bb_annotated (bb, *annotated_bb)) > { > if (bb->count > total_known_count) > { > - profile_count new_count = bb->count - total_known_count; > - AFDO_EINFO(unknown_edge)->set_count(new_count); > + profile_count new_count = bb->count - total_known_count; > + AFDO_EINFO(unknown_edge)->set_count(new_count); > +#if 0 > if (num_edge == 1) > { > basic_block succ_or_pred_bb = is_succ ? unknown_edge->dest : > unknown_edge->src; > @@ -1332,12 +1358,41 @@ afdo_propagate_edge (bool is_succ, bb_set > *annotated_bb) > set_bb_annotated (succ_or_pred_bb, annotated_bb); > } > } > +#endif > } > else > AFDO_EINFO (unknown_edge)->set_count (profile_count::zero().afdo ()); > + if (dump_file) > + { > + fprintf (dump_file, " Annotated edge %i->%i with count ", > + unknown_edge->src->index, unknown_edge->dest->index); > + AFDO_EINFO (unknown_edge)->get_count ().dump (dump_file); > + fprintf (dump_file, "\n"); > + } > AFDO_EINFO (unknown_edge)->set_annotated (); > changed = true; > } > + else if (total_known_count >= bb->count > + && num_unknown_edge > 1 > + && is_bb_annotated (bb, *annotated_bb)) > + { > + FOR_EACH_EDGE (e, ei, is_succ ? bb->succs : bb->preds) > + { > + gcc_assert (AFDO_EINFO (e) != NULL); > + if (! AFDO_EINFO (e)->is_annotated ()) > + { > + AFDO_EINFO(e)->set_count (profile_count::zero().afdo ()); > + AFDO_EINFO (e)->set_annotated (); > + if (dump_file) > + { > + fprintf (dump_file, " Annotated edge %i->%i with count ", > + e->src->index, e->dest->index); > + AFDO_EINFO (unknown_edge)->get_count ().dump (dump_file); > + fprintf (dump_file, "\n"); > + } > + } > + } > + } > } > return changed; > } > @@ -1471,6 +1526,8 @@ afdo_propagate (bb_set *annotated_bb) > changed = true; > afdo_propagate_circuit (*annotated_bb); > } > + if (changed && dump_file) > + fprintf (dump_file, "Limit of 10 iterations reached\n"); > } > > /* Propagate counts on control flow graph and calculate branch This is helping with the performance. However, I still have the ICE reported above. #2 0x0000000001917858 in profile_count::to_sreal_scale (this=0xffffefcd0958, in=..., known=0x0) at ../../gcc/gcc/profile-count.cc:342 342 gcc_checking_assert (compatible_p (in)); (gdb) p *this $4 = {static n_bits = 61, static max_count = 2305843009213693950, static uninitialized_count = 2305843009213693951, m_val = 1073741820, m_quality = GUESSED_LOCAL} (gdb) p in $5 = {static n_bits = 61, static max_count = 2305843009213693950, static uninitialized_count = 2305843009213693951, m_val = 1230, m_quality = AFDO} This is done with SPE and this can have low samples compared LBR. However, I think this can still happen.