Hi:
I have a "cluster" consisting of a dual Opteron system (called a.lan)
and a dual AthlonMP system (b.lan). Both systems are running Red Hat
Enterprise Linux 4. The opteron system runs in 64-bit mode; the AthlonMP
in 32-bit. I can't seem to make OpenMPI work between these two machines.
I've tried 1.1.2, 1.1.3b1, and 1.2b1 and they all exhibit the same
behavior, namely that Bcasts won't complete. Here's my simple.cpp test
program:

#include <iostream>
#include "mpi.h"

int main ( int argc, char* argv[] )
{
  MPI_Init( &argc, &argv );
  char hostname[256];
  int hostname_size = sizeof(hostname);
  MPI_Get_processor_name( hostname, &hostname_size );
  std::cout << "Running on " << hostname << std::endl;

  std::cout << hostname <<  " in to Bcast" << std::endl;
  double a = 3.14159;
  MPI_Bcast( &a, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD );
  std::cout << hostname << " out of Bcast" << std::endl;

  MPI_Finalize();
  return 0;
}

I compile this and run it with "mpirun --host a.lan --host b.lan
simple". Generally, if I'm on a.lan, I see:

Running on a.lan
a.lan in to Bcast
Running on b.lan
a.lan out of Bcast
b.lan in to Bcast
<then both processes hang, with the one on b.lan at 100% cpu>

If I launch from b.lan, then the reverse happens (i.e., it exits the
Bcast on b.lan, but never exits Bcast on a.lan and a.lan uses 100% cpu).

On the other hand, I have another 32-bit system (just a plain Athlon
running RHEL 4, called c.lan). My test program runs fine between b.lan
and c.lan.

I feel like I must be making an incredibly obvious mistake.

Thanks,
Allen

-- 
Allen Barnett
Transpire, Inc.
E-Mail: al...@transpireinc.com
Ph: 518-887-2930

Reply via email to