I've been trying out the RC4 builds of OpenMPI; I've been using Myrinet (gm), Infiniband (mvapi), and TCP.

When running a benchmark such as IMB (formerly PALLAS, IIRC), or even a simple hello world, there are no problems.

However, when running HPL (and HPCC, which is a superset of HPL), I have run into a problem: When running HPL (or when the execution reaches the HPL portion of HPCC), the process seems to get wedged...

I have no problems compiling and building HPL and HPCC for MPICH variants ( including MVAPICH, MPICH-GM/MX) and LAM; no problems with the gcc, Intel, PGI, or Pathscale compilers.

The HPL.dat (and hpccinf.txt) can be identical across the machines. The machines are identically configured (except for the interconnect).

However, when running the HPL code (on OpenMPI), HPL will peg the CPUs, and run until I feel like killing it.. If the 'N' size is larger than a fraction of a percent of free system memory (0.1% of free memory; system has 2 GB/CPU, in my case), HPL and HPCC will not finish computing that problem size. (Case in point -- a N size that is small enough that it takes 1-2 seconds with MPICH, MPICH-GM, MVAPICH, or LAM -- doesn't complete after several minutes on OpenMPI)

I'm therefore, somewhat confused; I've seen posts from people who claim to have run HPL with OpenMPI. I've had no issues running other benchmarks on OpenMPI; but HPL-based code seems to wedge itself... The behavior is consistent when I use Myrinet, Infiniband, or Ethernet.

I am running OpenMPI on Linux (SuSE Enterprise 9, SP2, x86_64).
Dual-Opteron 248; 2 GB/CPU

Reply via email to