Jean,

I am not able to reproduce this problem on a non-threaded build, can you try taking a fresh src package and configuring without thread support. I am wondering if this is simply a threading issue. I did note that you said you configured both with and without threads but try the configure on a fresh source, not on one that had previously been configured with thread support.

Thanks,

Galen


On Jan 18, 2006, at 2:13 PM, Jean-Christophe Hugly wrote:


Hi,

I have been trying for the past few days to get an MPI application (the
pallas bm) to run with ompi and openib.

My environment:
===============
. two quad cpu hosts with one mlx hca each.
. the hosts are running suse10 (kernel 2.6.13) with the latest (close to
it) from openib. (rev 4904, specifically)
. opensm runs on third machine with the same os.
. openmpi is built from openmpi-1.1a1r8727.tar.bz2

Behaviour:
==========
. openib seems to behave ok (ipoib works, rdma_bw and rdma_lat work, osm
works)
. I can mpirun any non-mpi program like ls, hostname, or ompi_info all
right.
. I can mpirun the pallas bm on any single host (the local one or the
other)
. I can mpirun the pallas bm on  the two nodes provided that I disable
the openib btl
. If I try to use the openib btl, the bm does not start (at best I get
the initial banner, sometimes not). On both hosts, I see that the PMB
processes (the correct number for each host) use 99% cpu.

I obtained the exact same behaviour with the following src packages:
 openmpi-1.0.1.tar.bz2
 openmpi-1.0.2a3r8706.tar.bz2
 openmpi-1.1a1r8727.tar.bz2

Earlier on, I also did the same experiment with openmpi-1.0.1 and the
stock gen2 of the suse kernel; same thing.

Configuration:
==============
For building, I tried the following variants:

./configure --prefix=/opt/ompi --enable-mpi-threads --enable- progress-thread
./configure --prefix=/opt/ompi
./configure --prefix=/opt/ompi --disable-smp-locks

I also tried many variations to mca-params.conf. What I normally use for trying openib is:
rmaps_base_schedule_policy = node
btl = ^tcp
mpi_paffinity_alone = 1

The mpirun cmd I normally use is:
mpirun -prefix /opt/ompi -wdir `pwd` -machinefile /root/machines - np 2 PMB-MPI1

My machine file being:
bench1 slots=4 max-slots=4
bench2 slots=4 max-slots=4

Am I doing something obviously wrong ?

Thanks for any help !

--
Jean-Christophe Hugly <j...@pantasys.com>
PANTA

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to