Hi Gonzalo, > I installed perf top but i am not sure how to use it.. I will investigate it.
Assuming you have build GNU Radio/your application with debugging symbols (for example, by having "cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo .."), try something like: sudo sysctl kernel/perf_event_paranoid=-1 perf record -a {your program} perf report Best regards, Marcus On 03/07/2016 10:30 PM, Gonzalo Arcos wrote: > Thanks for your answer. > > I installed perf top but i am not sure how to use it.. I will > investigate it. However, does the program need to be compiled in debug > mode for the performance counters to have effect? > > As a side question... Has anyone managed to profile a gnuradio > application with valgrind / oprofile? I am very interested in getting > this to work, since when i tried profiling it with those tools, and > then opening KCacheGrind, the displayed graph did not contain > information about each block, let alone functions inside blocks. It > has been several months since i tried this, but i remember that the > result was like 99.9% of the time running the start() function of the > block, and i could not get any more information than that, which of > course was not helpful at all. > > > > > 2016-02-29 6:28 GMT-03:00 West, Nathan <n...@ostatemail.okstate.edu > <mailto:n...@ostatemail.okstate.edu>>: > > It won't give you time spent, but 'perf top' is a nice tool that > gives function-level performance counters for all running code. It > comes with linux-tools and uses performance counters built in to > the kernel. There's also a couple of other perf subtools you can > explore. > > > Regarding your full buffers, I think that's a result of GNU > Radio's scheduler. > If you have a flowgraph with A->B and B takes a very long time to > process all of its samples then A will always have full output > buffers since it operates much faster. It's not necessarily bad or > cause for concern, but performance improvements should focus on B. > > -nathan > > On Sun, Feb 28, 2016 at 10:48 PM, Gonzalo Arcos > <gonzaloarco...@gmail.com <mailto:gonzaloarco...@gmail.com>> wrote: > > Thanks to all of you for your very informative answers. > > Douglas, i feel good now because you have described perfectly > all the things i did / thought on how to improve the > performance :), i also agree that merging blocks should be a > last time resort. I have used the performance monitor and > managed to improve the perofrmance of the most expensive > blocks. What i could not achieve though, is profiling the > program with a mainstream profiler, like valgrind or oprofile, > or some other profilers for python. I remember than when > visualizing the data, all the time was spent in the start() of > the top block, and i could not get information pertaining each > blocks general work, let alone functions executed within the > block. After discovering the performance monitor, i used it in > conjuntion with calls to clock() to determine the time spent > in each function within each block, to get a rough > measurement. But if it is possible to get this information > automatically, i am very interested in learning how to do it. > Could you help me? > > There is also another interesting aspect of improving > performance, which is blocks being blocked due to the output > buffer being full. Ive tried playing around a bit with the min > and max output buffer sizes, but the performance did not seem > to be affected. > After using the performance monitor to analyze the buffer > average full %, i see that most of them are relatively full, > however, i do not know if they are full enough to make an > upstream block to have to wait to push data into the buffer. > > > 2016-02-28 19:39 GMT-03:00 Douglas Geiger > <doug.gei...@bioradiation.net > <mailto:doug.gei...@bioradiation.net>>: > > The phenomenon Sylvain is pointing at is basically the > fact that as compilers improve, you should expect the > 'optimized' proto-kernels to no longer have as dramatic an > improvement compared with the generic ones. As to your > question of 'is it worth it' - that comes down to a couple > of things: for example - how much of an improvement do you > require to be 'worth it' (i.e., how much is your time > worth and/or how much of an performance improvement do you > require for your application). Similarly, is it worth it > to you to get cross-platform improvements (which is one of > the features of VOLK)? Or, perhaps, is it worth it to you > just to learn how to use VOLK? > > A couple of thoughts here: in general, when I have a > flowgraph that is not meeting my performance requirements, > my first step is to do some course profiling (i.e. via > gr-perf-monitorx) to determine if there is a single block > that is my primary performance bottleneck. If so - that is > the block I will concentrate on for optimizations (both > via VOLK, and/or any algorithmic improvements - e.g. can I > turn any run-time calculations into a look-up table > calculated either at compile-time, or within the constructor). > If there is not a clear bottleneck, then next I look a > little deeper using perf/oprofile to look at what > functions my flowgraph is spending a lot of time in: can I > e.g. create a faster version of some primitive calculation > that all my blocks use a lot, and therefore get a speed-up > across many blocks which should translate into a fast > over-all application. > > Finally, if I still need more improvements I would look > at collecting many blocks together into a single, larger > block. This is generally less desirable, since you now > have a (more) application-specific block, and it becomes > harder to re-use in later projects, but if you have > performance requirements that drive you there, then it > absolutely is an option. At this point you likely have > multiple operations being done to your incoming samples, > and it becomes easy to collect all of those into a single > larger VOLK call (and from there, create a SIMD-ized > proto-kernel that targets your particular platform). So, > while re-usability of code drives you away from this > scenario, it offers the greatest potential for performance > improvements, and thus is where many applications with > high performance requirements tend to gravitate towards. > Ideally you can strike a balance between the two: i.e. > have widely re-usable blocks, but with a set of operations > inside them that you can take advantage of e.g. SIMD-ized > function calls to make them high-performance. If you can > craft the block to be widely re-usable for a certain class > of things (e.g. look at how the OFDM blocks are setup to > be easily re-configurable for the many ways an OFDM > waveform can be crafted). In the long-run having more > knobs to turn to customize your existing code base to deal > with whatever new scenario you are looking at in 1/2/10 > years from now is always better than a brittle solution > that solves today's problem, but is difficult to re-use to > deal with tomorrow's. > > Hope that was helpful. If you are interested in learning > more about how to use VOLK - certainly have a look at > libvolk.org <http://libvolk.org> - the documentation is (I > think) fairly good at introducing the concepts and intent, > as well as how the API looks/works. And certainly don't be > shy about asking more questions here. > > Good luck, > Doug > > On Sun, Feb 28, 2016 at 1:58 AM, Sylvain Munaut > <246...@gmail.com <mailto:246...@gmail.com>> wrote: > > > Just wanted to ask the more experienced users if you think > this idea is > > worth a shot, or the performance improvement will be > marginal. > > Performance improvement is vastly dependent of the > operation you're doing. > > You can get an idea of the improvement by comparing > the volk-profile > output for the generic kernel (coded in pure C) and > the sse/avx ones. > > For instance, on my laptop : for some very simple one > (like float > add), the generic is barely slower than simd. Most > likely because it's > so simple than even the compiler itself was able to > simdize it by > itself. > But for other things (like complex multiply), the SIMD > version is 10x faster ... > > > Cheers, > > Sylvain > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org <mailto:Discuss-gnuradio@gnu.org> > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio > > > > > -- > Doug Geiger > doug.gei...@bioradiation.net > <mailto:doug.gei...@bioradiation.net> > > > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org <mailto:Discuss-gnuradio@gnu.org> > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio > > > > > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio