On Thu, Oct 31, 2013 at 03:56:10PM +0900, Namhyung Kim wrote: > From: Namhyung Kim <namhyung....@lge.com> > > It is possble that a callchain has cycles or recursive calls. In that > case it'll end up having entries more than 100% overhead in the > output. In order to prevent such entries, cache each callchain node > and skip if same entry already cumulated. > > Cc: Arun Sharma <asha...@fb.com> > Cc: Frederic Weisbecker <fweis...@gmail.com> > Signed-off-by: Namhyung Kim <namhy...@kernel.org> > --- > tools/perf/builtin-report.c | 49 > +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 49 insertions(+) > > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c > index 1b152a8b7f51..1a0de9a4a568 100644 > --- a/tools/perf/builtin-report.c > +++ b/tools/perf/builtin-report.c > @@ -387,6 +387,11 @@ iter_finish_normal_entry(struct add_entry_iter *iter, > struct addr_location *al) > return err; > } > > +struct cumulative_cache { > + struct dso *dso; > + struct symbol *sym; > +}; > + > static int > iter_prepare_cumulative_entry(struct add_entry_iter *iter, > struct machine *machine, > @@ -394,9 +399,31 @@ iter_prepare_cumulative_entry(struct add_entry_iter > *iter, > struct addr_location *al __maybe_unused, > struct perf_sample *sample) > { > + struct callchain_cursor_node *node; > + struct cumulative_cache *ccache; > + > callchain_cursor_commit(&callchain_cursor); > > /* > + * This is for detecting cycles or recursions so that they're > + * cumulated only one time to prevent entries more than 100% > + * overhead. > + */ > + ccache = malloc(sizeof(*ccache) * PERF_MAX_STACK_DEPTH); > + if (ccache == NULL) > + return -ENOMEM; > + > + node = callchain_cursor_current(&callchain_cursor); > + if (node == NULL) > + return 0;
Here you return without assigning iter->priv nor iter->priv->dso iter->priv->sym > + > + ccache[0].dso = node->map->dso; > + ccache[0].sym = node->sym; > + > + iter->priv = ccache; > + iter->curr = 1; Because the assignment is done here. > + > + /* > * The first callchain node always contains same information > * as a hist entry itself. So skip it in order to prevent > * double accounting. > @@ -501,8 +528,29 @@ iter_add_next_cumulative_entry(struct add_entry_iter > *iter, > { > struct perf_evsel *evsel = iter->evsel; > struct perf_sample *sample = iter->sample; > + struct cumulative_cache *ccache = iter->priv; > struct hist_entry *he; > int err = 0; > + int i; > + > + /* > + * Check if there's duplicate entries in the callchain. > + * It's possible that it has cycles or recursive calls. > + */ > + for (i = 0; i < iter->curr; i++) { > + if (sort__has_sym) { > + if (ccache[i].sym == al->sym) > + return 0; > + } else { > + /* Not much we can do - just compare the dso. */ > + if (ccache[i].dso == al->map->dso) sym and dso are used here > + return 0; > + } > + } > + > + ccache[i].dso = al->map->dso; > + ccache[i].sym = al->sym; > + iter->curr++; > > he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL, > sample->period, sample->weight, > @@ -538,6 +586,7 @@ iter_finish_cumulative_entry(struct add_entry_iter *iter, > evsel->hists.stats.total_period += sample->period; > hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE); > > + free(iter->priv); And here I'm seeing a double free when trying the patchset with other examples. I added a printf to the "if (node == NULL)" case and I'm hitting it. So it seems to me that, when reusing the entry, every user is freeing it and then the double free. This is my first time looking at perf code, so I might be missing LOT of things, sorry in advance :) I tried copying the dso and sym to the new allocated mem (and assigning iter->priv = ccache before the return if "node == NULL"), as shown in the attached patch, but when running with valgrind it also added some invalid reads and segfaults (without valgrind it didn't segfault, but I must be "lucky"). So if there is no node (node == NULL) and we cannot read the dso and sym from the current values of iter->priv (they show invalid reads in valgrind), I'm not sure where can we read them. And, IIUC, we should initialize them because they are used later. So maybe there are only some cases where we can read iter->priv and for the other cases just initialize to something (although doesn't feel possible because it's the dso and sym) ? Or should we read/copy them from some other place (maybe before some other thing is free'd) ? Or maybe forget about the malloc when node == NULL and just use iter->priv and the free shouldn't be executed till iter->curr == 1 ? I added that if for the free, but didn't help. Although I didn't really check how iter->curr is used. What am I missing ? I'm not really sure which is the fix for this. Also just in case I tried assigning "iter->priv = NULL" after it's free'd and it """fixes""" it. Just reverting the patch (reverts without conflict) also solves the double free problem for me (although it probably introduces the problem the patch tries to fix =) and seems to make valgrind happy too. Thanks a lot and sorry again if I'm completely missing some "rules/invariants", I'm really new to perf :) Rodrigo
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 281053b..a067be8 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -421,15 +421,20 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter, return -ENOMEM; node = callchain_cursor_current(&callchain_cursor); - if (node == NULL) + if (node == NULL) { + struct cumulative_cache *tmp = iter->priv; + ccache[0].dso = tmp[0].dso; + ccache[0].sym = tmp[0].sym; + iter->priv = ccache; return 0; - - ccache[0].dso = node->map->dso; - ccache[0].sym = node->sym; + } iter->priv = ccache; iter->curr = 1; + ccache[0].dso = node->map->dso; + ccache[0].sym = node->sym; + /* * The first callchain node always contains same information * as a hist entry itself. So skip it in order to prevent