Hi All,

I am building open-mpi-1.3.2 on centos-3.4, with torque-1.1.0p2-aspen3 and 
myrinet. I compiled it just fine with this configuration:
./configure --prefix=/home/software/ompi/1.3.2-pgi --with-gm=/usr/local/ 
--with-gm-libdir=/usr/local/lib64/ --enable-static --disable-shared 
--with-tm=/usr/ --without-threads CC=pgcc CXX=pgCC FC=pgf90 F77=pgf77 
LDFLAGS=-L/usr/lib64/torque/

However, when I submit jobs for 2 or more nodes through the torque schedular, 
the jobs just hang here. It shows the RUN state, but no communication between 
the nodes, then jobs will die with timeout.  

We have comfirmed that the myrinet is working because our lam-mpi-7.1 works 
just fine. We are having a really hard time determining what are the causes for 
this problem. So, we suspect it's because our torque is too old.

What is the lowest version requirement of torque for open-mpi-1.3.2? The README 
file didn't specify this detail. Does anyone know more about it?

Thanks in advance,

Kai
--------------------
Kai Song
<ks...@lbl.gov> 1.510.486.4894
High Performance Computing Services (HPCS) Intern
Lawrence Berkeley National Laboratory - http://scs.lbl.gov

Reply via email to