Re: [OMPI users] Cannot run a simple MPI program

2016-04-24 Thread Gilles Gouaillardet
As far as i am concerned, i would consider that as a bug : since the link is down, the psm component should simply disqualify itself, it will follow-up this on the devel ML Cheers, Gilles On 4/25/2016 10:36 AM, dpchoudh . wrote: Hello Gilles Thank you for finding the bug; it was not ther

Re: [OMPI users] Cannot run a simple MPI program

2016-04-24 Thread dpchoudh .
Hello George Adding --mca pml ob1 does make the program run. I just wanted to make sure that was the expected behaviour (as opposed to a bug in mpirun). Thanks Durga 1% of the executables have 99% of CPU privilege! Userspace code! Unite!! Occupy the kernel!!! On Sun, Apr 24, 2016 at 9:43 PM, Ge

Re: [OMPI users] Cannot run a simple MPI program

2016-04-24 Thread George Bosilca
Add --mca pml ob1 to your mpirun command. George On Sunday, April 24, 2016, dpchoudh . wrote: > Hello Gilles > > Thank you for finding the bug; it was not there in the original code; I > added it while trying to 'simplify' the code. > > With the bug fixed, the code now runs in the last scenario

Re: [OMPI users] Cannot run a simple MPI program

2016-04-24 Thread dpchoudh .
Hello Gilles Thank you for finding the bug; it was not there in the original code; I added it while trying to 'simplify' the code. With the bug fixed, the code now runs in the last scenario. But it still hangs with the following command line (even after updating to latest git tree, rebuilding and

Re: [OMPI users] Cannot run a simple MPI program

2016-04-24 Thread Gilles Gouaillardet
two comments : - the program is incorrect : slave() should MPI_Recv(..., MPI_ANY_TAG, ...) - current master uses pmix114, and your traces mention pmix120 so your master is out of sync, or pmix120 is an old module that was not manually removed. fwiw, once in a while, i rm -rf /.../ompi_in

[OMPI users] Cannot run a simple MPI program

2016-04-24 Thread dpchoudh .
Hello all Attached is a simple MPI program (a modified version of a similar program that was posted by another user). This program, when run on a single node machine, hangs most of the time, as follows: (in all cases, OS was CentOS 7) Scenario 1: OMPI v 1.10, single socket quad core machine, with

Re: [OMPI users] track progress of a mpi gather

2016-04-24 Thread Gilles Gouaillardet
As far as I understand, the tcp btl is ok Cheers, Gilles On Monday, April 25, 2016, dpchoudh . wrote: > Hello Gilles > > That idea crossed my mind as well, but I was under the impression that > MPI_THREAD_MULTIPLE is not very well supported on OpenMPI? I believe it is > not supported on OpenIB

Re: [OMPI users] track progress of a mpi gather

2016-04-24 Thread dpchoudh .
Hello Gilles That idea crossed my mind as well, but I was under the impression that MPI_THREAD_MULTIPLE is not very well supported on OpenMPI? I believe it is not supported on OpenIB, but the original poster seems to be using TCP. Does it work for TCP? Thanks Durga 1% of the executables have 99%

Re: [OMPI users] track progress of a mpi gather

2016-04-24 Thread Gilles Gouaillardet
an other option is to use MPI_THREAD_MULTIPLE, and MPI_Recv() on the master task in a dedicated thread, and use a unique tag (or MPI_Comm_dup() MPI_COMM_WORLD) to separate the traffic. If this is not the desired design, then the master task has to post MPI_Irecv() and "poll" with MPI_Probe() / MPI

Re: [OMPI users] track progress of a mpi gather

2016-04-24 Thread dpchoudh .
Hello I am not sure I am understanding your requirements correctly, but base on what I think it is, how about this: you do an MPI_Send() from all the non-root nodes to the root node and pack all the progress related data into this send. Use a special tag for this message to make it stand out from

[OMPI users] track progress of a mpi gather

2016-04-24 Thread MM
Hello, With a miniature case of 3 linux quadcore boxes, linked via 1Gbit Ethernet, I have a UI that runs on 1 of the 3 boxes, and that is the root of the communicator. I have a 1-second-running function on up to 10 parameters, my parameter space fits in the memory of the root, the space of it is N