Hi,

Am 15.04.2011 um 06:53 schrieb Derrick LIN:

> I am trying to setup a small SGE cluster with OpenMPI integrated but I am 
> totally stuck when trying to run a openmpi job to the SGE's PE.
> 
> I mainly followed the guide sge-snow.pdf from Revolutions Computing and 
> http://idolinux.blogspot.com/2010/04/quick-install-of-open-mpi-with-grid.html

- what is your SGE configuration `qconf -sconf`?


> <snip>
> For troubleshooting I have done several things below:
> 
> 1) passwordless SSH has been configurated properly for the execution hosts 
> and the queue master.
> 
> pwbcad@sgeqmast01:~$ ssh sgeqexec01 uptime
>  14:35:54 up  2:47,  1 user,  load average: 0.10, 0.08, 0.02

a) you are testing from master to a node, but jobs are running between nodes.

b) unless you need X11 forwarding, using SGE’s -builtin- communication works 
fine, this way you can have a cluster without `rsh` or `ssh` (or limited to 
admin staff) and can still run parallel jobs.


> 2) I could run a openmpi job outside the SGE successfully. 
> 
> mpirun -host n1, n2 -np 8 ./ompi_job
> 
> 3) I submitted job to a queue directly instead of a PE, the job could run and 
> completed successfully
> 
> qsub -q dev.q ./ompi_job.sh

Then you are bypassing SGE’s slot allocation and will have wrong accounting and 
no job control of the slave tasks.

-- Reuti


> 4) Although I don't think PATH and LD_LIBRARY_PATH would cause issues in 
> ubuntu, I still add OpenMPI binaries and libraries to both. But it didn't 
> help.
> 
> It will be very appreciated if anyone can share their experience!
> 
> Derrick
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to