* Frederic Weisbecker <fweis...@gmail.com> wrote: > On Thu, Sep 12, 2013 at 10:36:58PM +0200, Ingo Molnar wrote: > > > > * Frederic Weisbecker <fweis...@gmail.com> wrote: > > > > > The way we handle hists sorted by comm is to first gather them by tid > > > then in the end merge/collapse hists that end up with the same comm. > > > > > > But merging hists has shown some performances issues, especially with > > > callchain where the operation can be very heavy. > > > > > > So this new comm infrastructure aims at removing comm collapses. It > > > brings two features: > > > > > > 1) Keep track of comms lifecycle by storing timestamps when the comms > > > are set. This way we can map the precise comm to any thread:time couple. > > > This only works if the PERF_SAMPLE_ID comes along comm and fork events, > > > otherwise we only track the latest comm set for a thread. > > > > > > This can provide us more precise comm sorted hists by distinguishing pre > > > and post exec timeframes into seperate hists for a single thread. > > > > > > Note that although the comm infrastructure is ready to do this, I > > > haven't yet made the perf tools support that. It's a TODO entry. > > > > > > 2) Allocate comms only once instead of duplicating them for all threads > > > sharing a same one. Two threads having the same comm should now point to > > > the same string. As a result we can compare hists thread comm by > > > address. > > > > > > The big upside is that we can now live sort comm hists instead of > > > collapsing them in the end of the processing. > > > > > > I've seen very nice performance results on perf report. Roughly a 1.5x > > > to 2x on perf report default stdio output with callchains. > > > > > > You can try this branch: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git > > > perf/comm > > > > > > May be merging that with Namhyung callchains patches could provide some > > > cumulative nice results. > > > > It would be nice to try Linus's testcase, which is, in essence a kernel > > build profile: > > > > make defconfig > > perf record -g make -j64 bzImage > > > > and to make sure that it can analyze the data in same, non-annoying > > runtimes. What I saw was 30 minutes of runtime - a 2x improvement is not > > nearly enough, 15 minutes is still an eternity. > > I doubt we can reach anything near non-annonying runtimes after > recording all the callchains of a whole kernel build perf record. > > My patches and Namhyung's should improve the comm situation a lot but we > can't do much miracle. The only way would be perhaps to be able to limit > the deepness of the callchain branches. > > Now may be we can find other big contention point in perf. It's possible > we also have some endless loop somewhere.
Well, it was the 100,000+ step linear list walk that was causing 90% of the slowness here. Namhyung's patch should dramatically improve that. I guess time for someone to post a combined tree so that it can be tested all together? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/