Re: [OMPI users] error performing MPI_Comm_spawn

2009-12-15 Thread Ralph Castain
Okay, I can replicate this. FWIW: your test program works fine with the OMPI trunk and 1.3.3. It only has a problem with 1.4. Since I can replicate it on multiple machines every single time, I don't think it is actually a race condition. I think someone made a change to the 1.4 branch that cre

Re: [OMPI users] [visit-developers] /usr/bin/ld: cannot find -lrdmacm on 9184

2009-12-15 Thread tom fogal
Simon Su writes: > Hi Tom, > > I am using the standard openmpi package that run on all the cluster > machines here at Princeton. So, maybe I shouldn't touch openmpi. But, > removing -lrdmacm from the MPI_LIBS line in the machinename.conf file > worked. Any implication from doing this? The only t

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-15 Thread Ralph Castain
Finally got time to look at this - not sure this is a bug, if I understand correctly your scenario. When you say the application exits, do you mean it calls "exit" - or do you mean it segfaults or some other such abnormal termination? Reason I ask: if the process has not yet called MPI_Init and

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-15 Thread Eugene Loh
Matthew MacManes wrote: I would be happy to help troubleshoot, but I am not much of a programmer to know how. The hang is reproducible, and -mca btl ^sm is about 15% faster. if you want to shoot me some instructions off list, I can give it a go. The application that I am working with, prima

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-15 Thread Matthew MacManes
I would be happy to help troubleshoot, but I am not much of a programmer to know how. The hang is reproducible, and -mca btl ^sm is about 15% faster. if you want to shoot me some instructions off list, I can give it a go. The application that I am working with, primarily, is ABySS: http://www

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-15 Thread Eugene Loh
Matthew MacManes wrote: On my system, mpirun -np 8 -mca btl_sm_num_fifos 7 is much slower (and appeared to hang after several thousand interations) than -mca btl ^sm If the hang is reproducible, we should perhaps have a look. Also, the fact that it's much slower is interesting. Can you c

Re: [OMPI users] OpenFOAM fail to run under openmpi-1.3.3 on 2x Ubuntu 9.10 x64 Server

2009-12-15 Thread Ralph Castain
Your path on a remote node is wrong and so OMPI cannot find the required OMPI executable ("orted") to launch your job. Check your path to ensure that it is getting setup correctly on your remote nodes - every node needs to see you OMPI installed binaries and libraries. On Dec 15, 2009, at 10:4

[OMPI users] OpenFOAM fail to run under openmpi-1.3.3 on 2x Ubuntu 9.10 x64 Server

2009-12-15 Thread Dmitry Zaletnev
Hi, I've got the message: bash: orted: command not found A daemon (pid 1550) died unexpectedly with status 127 while attempting to launch so we are aborting when I tried to run OpenFOAM in parallel on 2x Ubuntu 9.10 x64 Server nodes the same way as it runs OK on 2x OpenSUSE 11.1 x64 nodes. Anothe

Re: [OMPI users] NFS and openmpi through different NICs

2009-12-15 Thread Gus Correa
Hi Dmitry Yes, we do it here. Besides Bill's recommendations you can also use different host/interface names for, say, eth0 and eth1, in /etc/hosts. Moreover, you should set OpenMPI to use only the specific subnet/ports you want to dedicate to MPI, say, eth1. See these faq: http://www.open-mpi

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-15 Thread Katz, Jacob
Ralph, Have you been able to confirm this as a bug? Thanks! Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet: (8)-465-5726 From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Cast

Re: [OMPI users] error performing MPI_Comm_spawn

2009-12-15 Thread Marcia Cristina Cera
Thank you, Ralph I will use the 1.3.3 for now... while waiting for a future fix release that break this race condiction. márcia On Tue, Dec 15, 2009 at 12:58 PM, Ralph Castain wrote: > Looks to me like it is a race condition, and the timing between 1.3.3 and > 1.4 is just enough to trip it. I

Re: [OMPI users] error performing MPI_Comm_spawn

2009-12-15 Thread Ralph Castain
Looks to me like it is a race condition, and the timing between 1.3.3 and 1.4 is just enough to trip it. I can break the race, but it will have to be in a future fix release. Meantime, your best bet is to either stick with 1.3.3 or add the delay. On Dec 15, 2009, at 5:51 AM, Marcia Cristina Cer

[OMPI users] error performing MPI_Comm_spawn

2009-12-15 Thread Marcia Cristina Cera
Hi, I intend to develop an application using the MPI_Comm_spawn to create dynamically new MPI tasks (or processes). The structure of the program is like a tree: each node creates 2 new ones until reaches a predefined number of levels. I developed a small program to explain my problem as can be se

Re: [OMPI users] checkpoint opempi-1.3.3+sge62

2009-12-15 Thread Sergio Díaz
Hi, Thanks Reuti. These links were very useful when I did the integration of BLCR with SGE. I will review them to check if there is more useful information. Regards, Sergio Reuti escribió: Hi, no, I never tried Open MPI's checkpointing. But there are two Howto's from which you may get som