[OMPI users] Problem with openmpi and infiniband

2008-12-23 Thread Biagio Lucini
I am willing to do, but in more than two months of testing/trying/hoping/praying I have accumulated so much material and information that if I post everything in this e-mail I am likely to confuse a potential helper, more than helping him to understand the problem. Thank you in advance, Biagio L

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-23 Thread Biagio Lucini
Hi Dorian, thank you for your message. doriankrause wrote: The trouble is with an MPI code that runs fine with an openmpi 1.1.2 library compiled without infiniband support (I have tested the scalability of the code up to 64 cores, the nodes are 4 or 8 cores, the results are exactly what I

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Biagio Lucini
Pavel Shamis (Pasha) wrote: Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-27 Thread Biagio Lucini
since the installation directory is non-standard (/opt/ompi128-intel/bin for the path and /opt/ompi128-intel/lib for the libs). I hope to have provided all the required info, if you need more or some of them in more detail, please let me know. Many thanks, Biagio Lucini Open

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-27 Thread Biagio Lucini
Jeff Squyres wrote: Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to use to tu

Re: [OMPI users] openMPI, transfer data from multiple sources to one destination

2008-12-29 Thread Biagio Lucini
goes back to (a) above This implementation assumes that you do not need the data in any particular order. Hope it works for you. Biagio -- = Dr. Biagio Lucini Department of Physics, Swansea University

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-29 Thread Biagio Lucini
know. Biagio -- = Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 =

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-02 Thread Biagio Lucini
Pavel Shamis (Pasha) wrote: Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-07 Thread Biagio Lucini
nce you have 4 and 8 core machines, this test could be run on the same 8 core machine over shared memory and not over Infiniband, as you suspected. You can rerun the IMB-MPI1 test with -mca btl self,openib to be sure that the test does not use shared memory or tcp. Lenny. On 12/24/08, Biagio

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-15 Thread Biagio Lucini
Jeff Squyres wrote: On Jan 7, 2009, at 6:28 PM, Biagio Lucini wrote: [[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 Ah! If we're de

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Biagio Lucini
ade of the firmware, although once again the OFED drivers were complaining about the firmware being too old) fixed the problem. We did both upgrades at once, so as in Brett's case I am not sure which one played the major role. Biagio -- ==

[OMPI users] "casual" error

2009-03-05 Thread Biagio Lucini
main? Many thanks, Biagio Lucini - [node20:04178] *** Process received signal *** [node20:04178] Signal: Segmentation fault (11) [node20:04178] Signal code: Addres

Re: [OMPI users] "casual" error

2009-03-05 Thread Biagio Lucini
messing up the memory). I suggest using some memory checker tools such as valgrind to check the memory consistency of your application. george. On Mar 5, 2009, at 17:37 , Biagio Lucini wrote: We have an application that runs for a very long time with 16 processes (the time is order a few