Re: [OMPI users] Problem with openmpi and infiniband

2009-01-15 Thread Biagio Lucini
Jeff Squyres wrote: On Jan 7, 2009, at 6:28 PM, Biagio Lucini wrote: [[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 Ah! If we're dealing a

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-12 Thread Jeff Squyres
On Jan 7, 2009, at 6:28 PM, Biagio Lucini wrote: [[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 Ah! If we're dealing a RNR retry exceed

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-07 Thread Biagio Lucini
The test was in fact ok, I have also verified it on 30 processors. Meanwhile I tried OMPI1.3RC2, with which the application fails on infiniband, I hope this will give some clue (or at least be useful to finalise the release of OpenMPI 1.3). I remind the mailing list that I use the OFED 1.2.5 re

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-04 Thread Lenny Verkhovsky
Hi, just to make sure, you wrote in the previous mail that you tested IMB-MPI1 and it "reports for the last test" , and the results are for "processes=6", since you have 4 and 8 core machines, this test could be run on the same 8 core machine over shared memory and not over Infiniband, as you

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-02 Thread Biagio Lucini
Pavel Shamis (Pasha) wrote: Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-29 Thread Biagio Lucini
Pavel Shamis (Pasha) wrote: Your problem definitely maybe related to the know issue with early completions. The exact syntax is:| --mca pml_ob1_use_early_completion 0| Thanks, I am currently looking for the first available spot on the cluster, then I will try this. I'll let you know. Biag

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-28 Thread Pavel Shamis (Pasha)
Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to use to turn off pml_ob1_use_

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-27 Thread Biagio Lucini
Jeff Squyres wrote: Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to use to tu

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-27 Thread Biagio Lucini
Tim Mattox wrote: For your runs with Open MPI over InfiniBand, try using openib,sm,self for the BTL setting, so that shared memory communications are used within a node. It would give us another datapoint to help diagnose the problem. As for other things we would need to help diagnose the probl

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-25 Thread Jeff Squyres
Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion On Dec 24, 2008, at 10:07 PM, Tim Mattox wrote: For your runs with Open MPI over InfiniBand, try using openib,s

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Tim Mattox
For your runs with Open MPI over InfiniBand, try using openib,sm,self for the BTL setting, so that shared memory communications are used within a node. It would give us another datapoint to help diagnose the problem. As for other things we would need to help diagnose the problem, please follow th

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Pavel Shamis (Pasha)
If the basic test run the installation is ok. So what happens when you try to run your application ? What is command line ? What is the error message ? do you run the application on the same set of machines with the same command line as IMB ? Pasha yes to both questions: the OMPI version is

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Biagio Lucini
Pavel Shamis (Pasha) wrote: Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administe

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Pavel Shamis (Pasha)
Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. The o

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-23 Thread Biagio Lucini
Hi Dorian, thank you for your message. doriankrause wrote: The trouble is with an MPI code that runs fine with an openmpi 1.1.2 library compiled without infiniband support (I have tested the scalability of the code up to 64 cores, the nodes are 4 or 8 cores, the results are exactly what I

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-23 Thread doriankrause
Hi Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. T

[OMPI users] Problem with openmpi and infiniband

2008-12-23 Thread Biagio Lucini
Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. The openfabric stac is OFED