Hi, Am 15.04.2011 um 06:53 schrieb Derrick LIN:
> I am trying to setup a small SGE cluster with OpenMPI integrated but I am > totally stuck when trying to run a openmpi job to the SGE's PE. > > I mainly followed the guide sge-snow.pdf from Revolutions Computing and > http://idolinux.blogspot.com/2010/04/quick-install-of-open-mpi-with-grid.html - what is your SGE configuration `qconf -sconf`? > <snip> > For troubleshooting I have done several things below: > > 1) passwordless SSH has been configurated properly for the execution hosts > and the queue master. > > pwbcad@sgeqmast01:~$ ssh sgeqexec01 uptime > 14:35:54 up 2:47, 1 user, load average: 0.10, 0.08, 0.02 a) you are testing from master to a node, but jobs are running between nodes. b) unless you need X11 forwarding, using SGE’s -builtin- communication works fine, this way you can have a cluster without `rsh` or `ssh` (or limited to admin staff) and can still run parallel jobs. > 2) I could run a openmpi job outside the SGE successfully. > > mpirun -host n1, n2 -np 8 ./ompi_job > > 3) I submitted job to a queue directly instead of a PE, the job could run and > completed successfully > > qsub -q dev.q ./ompi_job.sh Then you are bypassing SGE’s slot allocation and will have wrong accounting and no job control of the slave tasks. -- Reuti > 4) Although I don't think PATH and LD_LIBRARY_PATH would cause issues in > ubuntu, I still add OpenMPI binaries and libraries to both. But it didn't > help. > > It will be very appreciated if anyone can share their experience! > > Derrick > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users