The attached tarball does not have the MPICH variant results (the tarball is 87 kb as it is)
I can run the same tests with MVAPICH, MPICH-GM, and MPICH-MX with no problems. The benchmarks were built from source rpm's (that I maintain), so I can say the build procedure for the benchmarks is essentially identical from one MPI to another.
A short summary: * Identical hardware, except for the interconnect. * Linux, SLES 9 SP2, kernel 2.6.5-7.201-smp (SLES binary) * Opteron 248's, two CPU's per node, 4 GB per node. * Four nodes in every test run. I used the following interconnects/drivers: * Myrinet (GM 2.0.22 and MX 1.0.3) * Infiniband (Mellanox "IB Gold" 1.8) And the following benchmarks/tests: * HPC Challenge (v1.0) * HPL (v1.0) * Intel MPI Benchmark (IMB, formerly PALLAS) v2.3 * Presta MPI Benchmarks Quick summary of results: HPC Challenge: * Never completed an entire run on any interconnect - MVAPI came close; crashed after the HPL section. -Error messages: [n60:21912] *** An error occurred in MPI_Reduce [n60:21912] *** on communicator MPI_COMM_WORLD [n60:21912] *** MPI_ERR_OP: invalid reduce operation - GM wedges itself in the HPL section - MX crashes during the PTRANS test (the first test performed)(See earlier thread on this list about OpenMPI wedging itself; I did apply that workaround).
HPL: * Only completes with one interconnect: - MVAPI mca btl works fine. - GM wedges itself, similar to HPCC- MX gives an error: MX: assertion: <<not yet implemented>> failed at line 281, file ../mx__shmem.c
IMB: * Only completes with one interconnect: - MVAPI mca btl works fine.- GM fails, but differs in which portion of the benchmark it gets stuck at.
- MX fails, offering both the error listed in the HPL section, as well as:"mx_connect fail for 0th remote address key deadbeef (error Operation timed-out)"
Presta: * Completes with varying degrees of success - MVAPI: Completes successfully-But the 'all reduction' test is 173 times slower than the same test on GM, and is 360 times slower than with MX. - GM: Does not complete the 'com' test; simply stops at the same point every time (I have it included in my logs) - MX: Completes successfully, but I do receive the "mx_connect fail for 0th remote address key deadbeef (error Operation timed-out)" message.
I hope I've provided enough information to be useful; if not, just ask and I'll help out as much as I can.
openmpi.tar.bz2
Description: application/bzip2