David Mathog wrote:

Is there a tool in openmpi that will reveal how much "spin time" the processes are using?

I don't know what sort of answer is helpful for you, but I'll describe one option.

With Oracle Message Passing Toolkit (formerly Sun ClusterTools, anyhow, an OMPI distribution from Oracle/Sun) and Oracle Solaris Studio Performance Analyzer (formerly Sun Studio Performance Analyzer) you can see how much time is spent in MPI work, MPI wait, and so on. Specifically, by process, you could see (I'm making an example up) that process 2 spent:
* 35% of its time in application-level computation
* 5% of its time in MPI moving data
* 60% of its time in MPI waiting
but process 7 spent:
* 90% of its time in application-level computation
* 5% of its time in MPI moving data
* only 5% of its time in MPI waiting
That is, beyond the usual profiling support you might find in other tools, with Performance Analyzer you can distinguish time spent in MPI moving data from time spent in MPI waiting.

On the other hand, you perhaps don't need that much detail. For your purposes, it may suffice just to know how much time each process is spending in MPI. There are various profiling tools that will give you that. See http://www.open-mpi.org/faq/?category=perftools Load balancing is a common problem people investigate with such tools.

Finally, if you want to stick to tools like top, maybe another alternative is to get your application to go into sleep waits. I can't say this is the best choice, but it could be fun/interesting. Let's say your application only calls a handful of different MPI functions. Write PMPI wrappers for them that convert blocking functions (MPI_Send/MPI_Recv) to non-blocking ones mixed with short sleep calls. Not pretty, but might just be doable for your case. I don't know. Anyhow, that might make MPI wait time detectable with tools like top.

Reply via email to