(Using svn 'trunk' revision 7927 of OpenMPI):

I've found an interesting issue with OpenMPI and the mvapi btl mca: Most of the benchmarks I've tried (HPL, HPCC, Presta, IMB), do not seem to run properly when the number of processes is sufficiently large (the barrier seems to be at 65 processes in any case; more than 65 and things seem to get stuck):

IMB: Wedges itself before finishing its first test (PingPong, 0 bytes, 2 processes). Even when the number of processes is small enough to run, it may not finish (error message in attatchment). HPCC: Wedges itself after starting the PTRANS section of the benchmark (but before obtaining any results). HPL: Behaves similarly to IMB and HPCC; doesn't even finish the smallest of problem sizes.

Presta: the 'com' test almost completes; it only fails when matching rank id pairs (and only then with number of processes greater than 65)
                the 'allred' test behaves like IMB, HPCC, and HPL
                the 'laten' test partially works (misbehavior is similar to 
'com')
the 'globalop' test was a dog on 4 nodes (some odd 360 times slower on mvapi than on mx); it'll take a while to verify whether it tickles the 65-process issue or not.

Note: The cluster I am testing it on is of Dual-Opteron nodes; for purposes of comparision, I modified the machines file to start one process per node (total of 50 nodes). This ran with no complications. So the problem seems to be related to the process count, and not the node count.

Note part zwei: the config.log is for a slightly newer version of openMPI (7998; the difference to the trunk is about 4-5 files; none of them having to do with mvapi. I really need to start reaping the config.log before blasting it into oblivion.)

Unfortunately, I don't have enough myrinet hardware to test any more than 4 nodes with GM or MX; sorry.

Attachment: results.tar.bz2
Description: application/bzip2

Reply via email to