Hi Frederic, On Wed, 15 Jan 2014 17:59:30 +0100, Frederic Weisbecker wrote: > On Wed, Jan 15, 2014 at 03:23:46PM +0900, Namhyung Kim wrote: >> On Tue, 14 Jan 2014 16:37:15 +0100, Frederic Weisbecker wrote: >> > When a new callchain child branch matches an existing one in the rbtree, >> > the comparison of its first entry is performed twice: >> > >> > 1) From append_chain_children() on branch lookup >> > >> > 2) If 1) reports a match, append_chain() then compares all entries of >> > the new branch against the matching node in the rbtree, and this >> > comparison includes the first entry of the new branch again. >> >> Right. >> >> > >> > Lets shortcut this by performing the whole comparison only from >> > append_chain() which then returns the result of the comparison between >> > the first entry of the new branch and the iterating node in the rbtree. >> > If the first entry matches, the lookup on the current level of siblings >> > stops and propagates to the children of the matching nodes. >> >> Hmm.. it looks like that I thought directly calling append_chain() has >> some overhead - but it's not. > > No that's a right concern. I worried as well because I wasn't sure if there > is more match than unmatch on the first entry. I'd tend to think that the > first > entry endures unmatches most often, in which case calling match_chain() first > may be more efficient as a fast path (ie: calling append_chain() involves > one more function call and a few other details). > > But eventually measurement hasn't shown significant difference before and > after the patch.
I think if the sort key doesn't contain "symbol", unmatch case would be increased as more various callchains would go into a same entry. > >> >> > >> > This results in less comparisons performed by the CPU. >> >> Do you have any numbers? I suspect it'd not be a big change, but just >> curious. > > So I compared before/after the patchset (which include the cursor restore > removal) > with: > > 1) Some big hackbench-like load that generates > 200 MB perf.data > > perf record -g -- perf bench sched messaging -l $SOME_BIG_NUMBER > > 2) Compare before/after with the following reports: > > perf stat perf report --stdio > /dev/null > perf stat perf report --stdio -s sym > /dev/null > perf stat perf report --stdio -G > /dev/null > perf stat perf report --stdio -g fractal,0.5,caller,address > /dev/null > > And most of the time I had < 0.01% difference on time completion in favour of > the patchset > (which may be due to the removed cursor restore patch eventually). > > So, all in one, there was no real interesting difference. If you want the > true results I can definetly relaunch the tests. So as an extreme case, could you please also test "-s cpu" case and share the numbers? Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/