On Aug 10, 2009, at  13:01 PM, Gus Correa wrote:

Hi Jody

We don't have Mac OS-X, but Linux, not sure if this applies to you.

Did you configure your OpenMPI with Torque support,
and pointed to the same library that provides the
Torque you are using (--with-tm=/path/to/torque-library-directory)?

Not explicitly. I'll check into that....


Are you using the right mpirun? (There are so many out there.)

yeah - I use the explicit path and moved the OS X one.

Thanks!  Jody

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Jody Klymak wrote:
Hi All,
I've been trying to get torque pbs to work on my OS X 10.5.7 cluster with openMPI (after finding that Xgrid was pretty flaky about connections). I *think* this is an MPI problem (perhaps via operator error!)
If I submit openMPI with:
#PBS -l nodes=2:ppn=8
mpirun MyProg
pbs locks off two of the processors, checked via "pbsnodes -a", and the job output. But mpirun runs the whole job on the second of the two processors.
If I run the same job w/o qsub (i.e. using ssh)
mpirun -n 16 -host xserve01,xserve02 MyProg
it runs fine on all the nodes....
My /var/spool/toque/server_priv/nodes file looks like:
xserve01.local np=8
xserve02.local np=8
Any idea what could be going wrong or how to debu this properly? There is nothing suspicious in the server or mom logs.
Thanks for any help,
Jody
--
Jody Klymak
http://web.uvic.ca/~jklymak/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jody Klymak
http://web.uvic.ca/~jklymak/




Reply via email to