On Wed, Jan 15, 2014 at 03:23:46PM +0900, Namhyung Kim wrote: > On Tue, 14 Jan 2014 16:37:15 +0100, Frederic Weisbecker wrote: > > When a new callchain child branch matches an existing one in the rbtree, > > the comparison of its first entry is performed twice: > > > > 1) From append_chain_children() on branch lookup > > > > 2) If 1) reports a match, append_chain() then compares all entries of > > the new branch against the matching node in the rbtree, and this > > comparison includes the first entry of the new branch again. > > Right. > > > > > Lets shortcut this by performing the whole comparison only from > > append_chain() which then returns the result of the comparison between > > the first entry of the new branch and the iterating node in the rbtree. > > If the first entry matches, the lookup on the current level of siblings > > stops and propagates to the children of the matching nodes. > > Hmm.. it looks like that I thought directly calling append_chain() has > some overhead - but it's not.
No that's a right concern. I worried as well because I wasn't sure if there is more match than unmatch on the first entry. I'd tend to think that the first entry endures unmatches most often, in which case calling match_chain() first may be more efficient as a fast path (ie: calling append_chain() involves one more function call and a few other details). But eventually measurement hasn't shown significant difference before and after the patch. > > > > > This results in less comparisons performed by the CPU. > > Do you have any numbers? I suspect it'd not be a big change, but just > curious. So I compared before/after the patchset (which include the cursor restore removal) with: 1) Some big hackbench-like load that generates > 200 MB perf.data perf record -g -- perf bench sched messaging -l $SOME_BIG_NUMBER 2) Compare before/after with the following reports: perf stat perf report --stdio > /dev/null perf stat perf report --stdio -s sym > /dev/null perf stat perf report --stdio -G > /dev/null perf stat perf report --stdio -g fractal,0.5,caller,address > /dev/null And most of the time I had < 0.01% difference on time completion in favour of the patchset (which may be due to the removed cursor restore patch eventually). So, all in one, there was no real interesting difference. If you want the true results I can definetly relaunch the tests. > > > > Signed-off-by: Frederic Weisbecker <fweis...@gmail.com> > > Reviewed-by: Namhyung Kim <namhy...@kernel.org> Thanks! > > Thanks, > Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/