Hello!

We are having problems integrating BLCR + OpenMPI + LSF in a linux cluster
with Infiniband

We compiled OpenMPI version 1.6 with gcc version 4.6.0 ... The configure
line was like:

./configure --prefix=/opt/share/mpi-openmpi/1.6-gcc-4.6.0/el6/x86_64
--with-lsf --with-openib --with-blcr=/opt/share/blcrv0.8.4.app/
--with-ft=cr --enable-ft-thread --enable-opal-multi-threads --with-psm

The problem I am having is that for some reason the ft-enable-cr features
freezes my mpi application when I use more that one node. The job is never
started ...

We narrowed the search down and we noticed that when mpirun is used out of
the batch system, it works... but as soon as the mpirun detects the env
variable LSB_JOBID and assumes it is under LSF environment, the problem
arises... Additionally, if we use "--mca plm rsh" which should deactivate
the LSF integration , it works again, as expected...

So, or guess is: or there is something misconfigured in our LSF or there is
a problem in the plm module inside openmpi ...

Any hint???

Thanks!!

Jorge Naranjo

Reply via email to