jody wrote:

Hi
I wanted to profile my application using gprof, and proceeded like
when profiling a normal application:
- compile everything with option -pg
- run application
- call gprof
This returns a normal-looking output, but i don't know
whether this is the data for node 0 only or accumulated for all nodes.

Does anybody have experience in profiling parallel applications?
Is there a way to have profile data for each node separately?
If not, is there another profiling tool which can?
Gosh, I'm trying not to sound like a repeating commercial, but this is a rather direct answer to your question.

If you use Sun Studio compilers and tools, there is a Performance Analyzer. The basic mode of operation is that it samples the callstack periodically. So, you don't get the huge data volumes that tracing tools generate, but you do get statistically fair data that shows where time is spent. If you preface your "mpirun" command with "collect", then you get data for all the MPI processes in your job. You can look at data aggregated over all processes or for some subset. You can get gprof-style information about where time is spent. You can also trace MPI calls, the memory heap, hardware events (like cache misses), etc. Tool is available from http://developers.sun.com/sunstudio/ via free download for Linux and Solaris on on x86 and SPARC. You don't need to compile your program specially (I mean, no -pg). Fine print applies to every statement I'm making in this paragraph, but I'm trying to keep it short.

Again, sorry if it sounds like a commercial, but it's intended to be a direct answer to your question.

P.S. If you go to http://developers.sun.com/sunstudio/documentation/demos/index.jsp , "halfway down" is a set of presentations on "How to Perform Analysis". This can give you more information on Performance Analyzer. I don't know how much, if any, is specific to MPI, but should be helpful.

Reply via email to