[OMPI users] OpenMPI program getting stuck at poll()

2009-03-09 Thread Prasanna Ranganathan
Hi all, I have a distributed program running on 400+ nodes and using OpenMPI. I have run the same binary with nearly the same setup successfully previously. However in my last two runs the program seems to be getting stuck after a while before it completes. The stack trace at the time it gets stu

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-15 Thread Prasanna Ranganathan
Please send me your /etc/make.conf and the contents of >> /var/db/pkg/sys-cluster/openmpi-1.2.7/ >> >> You can package this with the following command line: >> >> tar -cjf data.tbz /etc/make.conf /var/db/pkg/sys-cluster/openmpi-1.2.7/ >> >> And simply send

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-12 Thread Prasanna Ranganathan
Hi, I did make sure at the beginning that only eth0 was activated on all the nodes. Nevertheless, I am currently verifying the NIC configuration on all the nodes and making sure things are as expected. While trying different things, I did come across this peculiar error which I had detailed in o

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-12 Thread Prasanna Ranganathan
Hi, I have verified the openMPI version to be 1.2.7 on all the nodes and also ompi_info | grep thread is Thread support: posix (mpi: no, progress: no) on these machines. I get the error with and without -mca oob_tcp_listen_mode listen_thread. Sometimes, the startup takes too long with the liste

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-11 Thread Prasanna Ranganathan
eb89-4293-a241-8487174b4...@cisco.com> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > On Sep 10, 2008, at 9:29 PM, Prasanna Ranganathan wrote: > >> I have upgraded to 1.2.7 and am still noticing the issue. > > FWIW, we didn't change anything wit

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Prasanna Ranganathan
P socket > timeouts (which are important when dealing with large numbers of MPI > processes). > > > > On Sep 8, 2008, at 4:36 PM, Prasanna Ranganathan wrote: > >> Hi, >> >> I am trying to run a test mpiHelloWorld program that simply >> initializes the

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Prasanna Ranganathan
Hi Eric, Thanks a lot for the reply. I am currently working on upgrading to 1.2.7 I do not quite follow your directions; What do you refer to when you say say "try with USE=-threads..." Kindly excuse if it is a silly question and pardon my ignorance :D Regards, Prasanna.

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-10 Thread Prasanna Ranganathan
of MPI > processes). > > > > On Sep 8, 2008, at 4:36 PM, Prasanna Ranganathan wrote: > >> Hi, >> >> I am trying to run a test mpiHelloWorld program that simply >> initializes the MPI environment on all the nodes, prints the >> hostname and rank of eac

Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-09 Thread Prasanna Ranganathan
Hi Jeff/Paul, Thanks a lot for your replies. I am looking into upgrading MPI to a newer version. As I use a few custom built libraries as part of my main parallel application that recommend the use of 1.1.2, I first need to check compatibility issues with the newer version before I can upgrade.

[OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-08 Thread Prasanna Ranganathan
Hi, I am trying to run a test mpiHelloWorld program that simply initializes the MPI environment on all the nodes, prints the hostname and rank of each node in the MPI process group and exits. I am using MPI 1.1.2 and am running 997 processes on 499 nodes (Nodes have 2 dual core CPUs). I get the