I'm new to openMPI. I'm trying to set it up for using xgrid. I have read that v1.3 and v1.4 are broken on OSX 10.5 and 10.6 although I have seen some discussions in the archives of this mail list saying some people have v1.4 running on 10.6.
I have now compiled both openMPI 1.2 and openMPI1.5rc and neither of these is working for me with xgrid. Both of these say they work with xgrid. The failuremodes are different. Anyone know how to get a working install? I am building this on a OSX 10.5.8 machine. THe xgrid controller is on a OSX 10.6 server machine. I have tried configuring with and without the --with-xgrid option. Behaviour of openMPI1.2 $ /usr/local/openmpi/bin/mpirun -nolocal -n 2 /bin/hostname THe job appears in the xgrid queue, and the logs show it is running on a remote machine. However nothing ever happens and peeking in the xgrid results I see: $ xgrid -job results -id 8703 [brio.llnl.gov:38789] [0,0,1]-[0,0,0] mca_oob_tcp_peer_complete_connect: connection failed: Operation timed out (60) - retrying [brio.llnl.gov:38792] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect: connection failed: Operation timed out (60) - retrying Perhaps a firewall issue? Of course I'm more interested in getting the new openMPI1.5 working. When I run this, again I get an entry in the queue, and the job runs on a remote machine but I get a job failed message $ /usr/local/openmpi5/bin/mpirun -n 2 /bin/hostname $ xgrid -job results -id 8702 [brio.llnl.gov:38776] Error: unknown option "-mca" ---- Note I have NOT installed openMPI on any of the other computers in the grid. So perhaps that is the problem? If I did install it on other computers how would I tell mpirun where to find the path to the install point? ---- Finally in both cases, I don't see any way to pass xgrid specific argument in on the mpi command line. An xgrid controller divides the agents into sets of logical grids and you need to specify which logical grid to submit the job to. In xgrid cli syntax one write "xgrid -gid 2" for grid 2. When I use openMPI all the jobs get sent to just the default grid which is the grid that xgrid uses if no gid is specified.