On Wed, Jan 15, 2014 at 03:23:46PM +0900, Namhyung Kim wrote:
> On Tue, 14 Jan 2014 16:37:15 +0100, Frederic Weisbecker wrote:
> > When a new callchain child branch matches an existing one in the rbtree,
> > the comparison of its first entry is performed twice:
> >
> > 1) From append_chain_children() on branch lookup
> >
> > 2) If 1) reports a match, append_chain() then compares all entries of
> > the new branch against the matching node in the rbtree, and this
> > comparison includes the first entry of the new branch again.
> 
> Right.
> 
> >
> > Lets shortcut this by performing the whole comparison only from
> > append_chain() which then returns the result of the comparison between
> > the first entry of the new branch and the iterating node in the rbtree.
> > If the first entry matches, the lookup on the current level of siblings
> > stops and propagates to the children of the matching nodes.
> 
> Hmm..  it looks like that I thought directly calling append_chain() has
> some overhead - but it's not.

No that's a right concern. I worried as well because I wasn't sure if there
is more match than unmatch on the first entry. I'd tend to think that the first
entry endures unmatches most often, in which case calling match_chain() first
may be more efficient as a fast path (ie: calling append_chain() involves
one more function call and a few other details).

But eventually measurement hasn't shown significant difference before and
after the patch.

> 
> >
> > This results in less comparisons performed by the CPU.
> 
> Do you have any numbers?  I suspect it'd not be a big change, but just
> curious.

So I compared before/after the patchset (which include the cursor restore 
removal)
with:

        1) Some big hackbench-like load that generates > 200 MB perf.data

        perf record -g -- perf bench sched messaging -l $SOME_BIG_NUMBER

        2) Compare before/after with the following reports:

        perf stat perf report --stdio > /dev/null
        perf stat perf report --stdio -s sym > /dev/null
        perf stat perf report --stdio -G > /dev/null
        perf stat perf report --stdio -g fractal,0.5,caller,address > /dev/null 

And most of the time I had < 0.01% difference on time completion in favour of 
the patchset
(which may be due to the removed cursor restore patch eventually).

So, all in one, there was no real interesting difference. If you want the true 
results I can definetly relaunch the tests.

> >
> > Signed-off-by: Frederic Weisbecker <fweis...@gmail.com>
> 
> Reviewed-by: Namhyung Kim <namhy...@kernel.org>

Thanks!

> 
> Thanks,
> Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to