Re: [OMPI users] Infiniband driver update must recompile openmpi?

2018-05-30 Thread Jeff Squyres (jsquyres)
> On May 29, 2018, at 10:30 PM, Kaiming Ouyang wrote: > > I have a question about recompiling openmpi. > Recently I updated the infiniband driver for network card Mellanox, but I > found original openmpi did not work anymore. Does this mean the driver update > must be followed by recompiling op

Re: [OMPI users] Infiniband errors

2012-12-20 Thread Syed Ahsan Ali
Dear Yann Here is the output *[root@compute-01-01 ~]# cat /etc/redhat-release* Red Hat Enterprise Linux Server release 5.3 (Tikanga) *[root@compute-01-01 ~]# uname -a* Linux compute-01-01.private.dns.zone 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux *[root@com

Re: [OMPI users] Infiniband errors

2012-12-19 Thread Yann Droneaud
Le mercredi 19 décembre 2012 à 12:12 +0500, Syed Ahsan Ali a écrit : > Dear John > > I found this output of ibstatus on some nodes (most probably the > problem causing) > [root@compute-01-08 ~]# ibstatus > > Fatal error: device '*': sys files not found > (/sys/class/infiniband/*/ports) > > Do

Re: [OMPI users] Infiniband errors

2012-12-19 Thread Shamis, Pavel
Seems like driver was not started. I would suggest to run lspci and check if the HCA is visible on HW level. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Dec 19, 2012, at 2:12 AM, Syed Ahsan Ali wrote: Dear Joh

Re: [OMPI users] Infiniband errors

2012-12-19 Thread Syed Ahsan Ali
Dear John I found this output of ibstatus on some nodes (most probably the problem causing) [root@compute-01-08 ~]# ibstatus Fatal error: device '*': sys files not found (/sys/class/infiniband/*/ports) Does this show any hardware or software issue? Thanks On Wed, Nov 28, 2012 at 3:17 PM, Jo

Re: [OMPI users] Infiniband errors

2012-11-28 Thread Syed Ahsan Ali
I am not sure about drivers because those were installed by someone else during cluster setup. I see following information about infiniband card. The card is DDR InfiniBand Mellanox ConnectX. On Wed, Nov 28, 2012 at 3:17 PM, John Hearns wrote: > Those diagnostics are from Openfabrics. > What ty

Re: [OMPI users] Infiniband errors

2012-11-28 Thread John Hearns
Those diagnostics are from Openfabrics. What type of infiniband card do you have? What drivers are you using?

Re: [OMPI users] Infiniband errors

2012-11-28 Thread Syed Ahsan Ali
ibstats comes with some other distribution? I don't have this command available right now On Wed, Nov 28, 2012 at 1:14 PM, John Hearns wrote: > Short answer. Run ibstats or ibstatus. > Look also at the logs of your subnet manager. > > ___ > users mail

Re: [OMPI users] Infiniband errors

2012-11-28 Thread John Hearns
Short answer. Run ibstats or ibstatus. Look also at the logs of your subnet manager.

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-10 Thread Randolph Pullen
Cc: OpenMPI Users Sent: Monday, 10 September 2012 9:11 PM Subject: Re: [OMPI users] Infiniband performance Problem and stalling Randolph, So what you saying in short, leaving all the numbers aside, is the following: In your particular application on your particular setup with this particular

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-10 Thread Yevgeny Kliteynik
Yevgeny Kliteynik > *To:* Randolph Pullen > *Cc:* OpenMPI Users > *Sent:* Sunday, 9 September 2012 6:18 PM > *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling > > Randolph, > > O

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-09 Thread Randolph Pullen
See my comments in line... From: Yevgeny Kliteynik To: Randolph Pullen Cc: OpenMPI Users Sent: Sunday, 9 September 2012 6:18 PM Subject: Re: [OMPI users] Infiniband performance Problem and stalling Randolph, On 9/7/2012 7:43 AM, Randolph Pullen wrote

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-09 Thread Yevgeny Kliteynik
Randolph, On 9/7/2012 7:43 AM, Randolph Pullen wrote: > Yevgeny, > The ibstat results: > CA 'mthca0' > CA type: MT25208 (MT23108 compat mode) What you have is InfiniHost III HCA, which is 4x SDR card. This card has theoretical peak of 10 Gb/s, which is 1GB/s in IB bit coding. > And more interest

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-07 Thread Randolph Pullen
- > *From:* Yevgeny Kliteynik > *To:* Randolph Pullen ; Open MPI Users > > *Sent:* Sunday, 2 September 2012 10:54 PM > *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling > > Randolp

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-07 Thread Randolph Pullen
2012 6:03 PM Subject: Re: [OMPI users] Infiniband performance Problem and stalling On 9/3/2012 4:14 AM, Randolph Pullen wrote: > No RoCE, Just native IB with TCP over the top. Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card". Could you run "ibstat&qu

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-06 Thread Yevgeny Kliteynik
--- > *From:* Yevgeny Kliteynik > *To:* Randolph Pullen ; Open MPI Users > > *Sent:* Sunday, 2 September 2012 10:54 PM > *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling > > Randolph, > > Some clarification on the setup: > > &q

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-02 Thread Randolph Pullen
ay, 2 September 2012 10:54 PM Subject: Re: [OMPI users] Infiniband performance Problem and stalling Randolph, Some clarification on the setup: "Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to Ethernet? That is, when you're using openib BTL, you mean RoCE, right?

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-02 Thread Yevgeny Kliteynik
Randolph, Some clarification on the setup: "Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to Ethernet? That is, when you're using openib BTL, you mean RoCE, right? Also, have you had a chance to try some newer OMPI release? Any 1.6.x would do. -- YK On 8/31/2012 10:53 AM,

Re: [OMPI users] Infiniband performance Problem and stalling

2012-08-30 Thread Randolph Pullen
- On occasions it seems to stall indefinately, waiting on a single receive.  Any ideas appreciated. Thanks in advance, Randolph From: Randolph Pullen To: Paul Kapinos ; Open MPI Users Sent: Thursday, 30 August 2012 11:46 AM Subject: Re: [OMPI users] Infin

Re: [OMPI users] Infiniband performance Problem and stalling

2012-08-29 Thread Randolph Pullen
64K and force short messages.  Then the openib times are the same as TCP and no faster. I'ms till at a loss as to why... From: Paul Kapinos To: Randolph Pullen ; Open MPI Users Sent: Tuesday, 28 August 2012 6:13 PM Subject: Re: [OMPI users] Infin

Re: [OMPI users] Infiniband performance Problem and stalling

2012-08-28 Thread Paul Kapinos
Randolph, after reading this: On 08/28/12 04:26, Randolph Pullen wrote: - On occasions it seems to stall indefinately, waiting on a single receive. ... I would make a blind guess: are you aware about IB card parameters for registered memory? http://www.open-mpi.org/faq/?category=openfabrics#

Re: [OMPI users] infiniband with MPI

2012-07-31 Thread Jeff Squyres
On Jul 31, 2012, at 12:14 AM, Joen Chen wrote: > After reading the FAQ about OFED, I knew that openMPI can collaborate with > RoCE. Correct -- Open MPI can use RoCE interfaces, if they are available. > Moreover, using the RoCE make some overhead because the underlying network > layers. In my i

Re: [OMPI users] InfiniBand path migration not working

2012-03-21 Thread Shamis, Pavel
Jeremy, As far as I understand the tool that Evgeny recommended showed that the remote port is reachable. Based on the log that have been provided I can't find the issue in ompi, everything seems to be kosher. Unfortunately, I do not have a platform where I may try to reproduce the issue. I wo

Re: [OMPI users] InfiniBand path migration not working

2012-03-21 Thread Jeremy
Hi Pasha, I just wanted to check if you had any further suggestions regarding the APM issue based on the updated info in my previous email. Thanks, -Jeremy On Mon, Mar 12, 2012 at 12:43 PM, Jeremy wrote: > Hi Pasha, Yevgeny, > >>> My educated guess is that from some reason it is no direct conn

Re: [OMPI users] InfiniBand path migration not working

2012-03-12 Thread Jeremy
Hi Pasha, Yevgeny, >> My educated guess is that from some reason it is no direct connection path >> between lid-2 and lid-4. To prove it we have to look and the OpenSM routing >> information. > If you don't get response or you get info of > the device different that what you would expect, > then

Re: [OMPI users] InfiniBand path migration not working

2012-03-11 Thread Yevgeny Kliteynik
Hi, I just noticed that my previous mail bounced, but it doesn't matter. Please ignore it if you got it anyway - I re-read the thread and there is a much simpler way to do it. If you want to check whether LID L is reachable through HCA H from port P, you can run this command: smpquery --Ca H

Re: [OMPI users] InfiniBand path migration not working

2012-03-09 Thread Jeremy
On Thu, Mar 8, 2012 at 10:44 AM, Shamis, Pavel wrote: > Jeremy, > Finally I had a chance to look at log file. Hi Pasha, I appreciate the review you did and the comments you provided. I will see if we can get some additional routing information. I will also do some experiments with a more trivi

Re: [OMPI users] InfiniBand path migration not working

2012-03-08 Thread Shamis, Pavel
Jeremy, Finally I had a chance to look at log file. Initially all qps are created on port 1, and in the same time alternative path loaded (ports 2, lids 4 and 2 ). I guess in some point you switch off port 1, APM even is reported because the alternative path is active now, and from some reason

Re: [OMPI users] InfiniBand path migration not working

2012-02-29 Thread Jeremy
Hi Pasha, >On Wed, Feb 29, 2012 at 11:02 AM, Shamis, Pavel wrote: > > I would like to see all the file. > 28MB is it the size after compression ? > > I think gmail supports up to 25Mb. > You may try to create gzip file and then slice it using "split" command. See attached. At about line 151311 i

Re: [OMPI users] InfiniBand path migration not working

2012-02-29 Thread Shamis, Pavel
> >> On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote: >> I reviewed the code and it seems to be ok :) The error should be reported if >> the port migration is already happened once (port 1 to port 2), and now you >> are trying to shutdown port 2 and MPI reports that it can't migrate anymo

Re: [OMPI users] InfiniBand path migration not working

2012-02-28 Thread Jeremy
Hi Pasha, >On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote: > I reviewed the code and it seems to be ok :) The error should be reported if > the port migration is already happened once (port 1 to port 2), and now you > are trying to shutdown port 2 and MPI reports that it can't migrate an

Re: [OMPI users] InfiniBand path migration not working

2012-02-28 Thread Shamis, Pavel
Jeremy, I reviewed the code and it seems to be ok :) The error should be reported if the port migration is already happened once (port 1 to port 2), and now you are trying to shutdown port 2 and MPI reports that it can't migrate anymore. It assumes that port 1 is still down and it can't go back

Re: [OMPI users] InfiniBand path migration not working

2012-02-23 Thread Jeremy
Hi Pasha, Thanks for your response. I look forward to hearing from you when you have a chance. -Jeremy On Wed, Feb 22, 2012 at 10:43 PM, Shamis, Pavel wrote: > Jeremy, > I implemented the APM support for openib btl a long time ago. I do not > remember all the details of the implementation, but

Re: [OMPI users] InfiniBand path migration not working

2012-02-22 Thread Shamis, Pavel
Jeremy, I implemented the APM support for openib btl a long time ago. I do not remember all the details of the implementation, but I remember that it is used to support LMC bits and multiple ib ports. Unfortunately I'm super busy this week. I will try look at it early next week. Pavel (Pasha) S

Re: [OMPI users] Infiniband Error

2011-09-12 Thread Yevgeny Kliteynik
This means that you have some problem on that node, and it's probably unrelated to Open MPI. Bad cable? Bad port? FW/driver in some bad state? Do other IB performance tests work OK on this node? Try rebooting the node. -- YK On 12-Sep-11 7:52 AM, Ahsan Ali wrote: > Hello all > > I am getting fol

Re: [OMPI users] InfiniBand, different OpenFabrics transport types

2011-07-19 Thread Bill Johnstone
Yevgeny, Sorry for the delay in replying -- I'd been out for a few days. - Original Message - > From: Yevgeny Kliteynik > Sent: Thursday, July 14, 2011 12:51 AM > Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types   > While I'm try

Re: [OMPI users] InfiniBand, different OpenFabrics transport types

2011-07-14 Thread Yevgeny Kliteynik
On 11-Jul-11 5:23 PM, Bill Johnstone wrote: > Hi Yevgeny and list, > > - Original Message - > >> From: Yevgeny Kliteynik > >> I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you. > > Thank you. That's interesting... This MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thingy imp

Re: [OMPI users] InfiniBand, different OpenFabrics transport types

2011-07-11 Thread Bill Johnstone
Hi Yevgeny and list, - Original Message - > From: Yevgeny Kliteynik > I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you. Thank you. > One question though, just to make sure we're on the same page: so the jobs > do run OK on > the older HCAs, as long as they

Re: [OMPI users] InfiniBand, different OpenFabrics transport types

2011-07-10 Thread Yevgeny Kliteynik
Hi Bill, On 08-Jul-11 7:59 PM, Bill Johnstone wrote: > Hello, and thanks for the reply. > > > > - Original Message - >> From: Jeff Squyres >> Sent: Thursday, July 7, 2011 5:14 PM >> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport

Re: [OMPI users] InfiniBand, different OpenFabrics transport types

2011-07-08 Thread Bill Johnstone
Hello, and thanks for the reply. - Original Message - > From: Jeff Squyres > Sent: Thursday, July 7, 2011 5:14 PM > Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types > > On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote: > >> I have

Re: [OMPI users] InfiniBand, different OpenFabrics transport types

2011-07-07 Thread Jeff Squyres
On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote: > I have a heterogeneous network of InfiniBand-equipped hosts which are all > connected to the same backbone switch, an older SDR 10 Gb/s unit. > > One set of nodes uses the Mellanox "ib_mthca" driver, while the other uses > the "mlx4" driver.

Re: [OMPI users] Infiniband problem, kernel mismatch

2010-12-03 Thread Peter Kjellström
On Friday 19 November 2010 01:03:35 HeeJin Kim wrote: ... > * mlx4: There is a mismatch between the kernel and the userspace > libraries: Kernel does not support XRC. Exiting.* ... > What I'm thinking is that the infiniband card is installed but it doesn't > work in correct mode. > My linux kerne

Re: [OMPI users] Infiniband problem, kernel mismatch

2010-11-22 Thread Jeff Squyres
On Nov 18, 2010, at 7:03 PM, HeeJin Kim wrote: > I'm using Mellanox infiniband network card and trying to run it with openmpi. > The problem is that I can connect and communicate between nodes, but I'm not > sure whether it is in a correct state or not. > > I have two version of openmpi, one is

Re: [OMPI users] Infiniband error

2010-11-12 Thread Jeff Squyres
It would be best if an IB vendor replies (hint hint!), but it is likely that you have some kind of hardware issue on that node (e.g., a bad / flakey HCA, etc.). You should probably run a full set of layer-0 diagnostics on your fabric to make sure it's clean. I say this because back when Cisco

Re: [OMPI users] Infiniband Question

2010-02-05 Thread Jeff Squyres
Yep -- it's normal. Those IP addresses are used for bootstrapping/startup, not for MPI traffic. In particular, that "HNP URI" stuff is used by Open MPI's underlying run-time environment. It's not used by the MPI layer at all. On Feb 5, 2010, at 2:32 PM, Mike Hanby wrote: > Howdy, > > When

Re: [OMPI users] infiniband question

2009-09-17 Thread Jeff Squyres
Correct, you don't need DAPL. Can you send all the information listed here: http://www.open-mpi.org/community/help/ On Sep 17, 2009, at 9:17 AM, Yann JOBIC wrote: Hi, I'm new to infiniband. I installed the rdma_cm, rdma_ucm and ib_uverbs kernel modules. When i'm running a ring test o

Re: [OMPI users] Infiniband requirements

2009-06-30 Thread Prentice Bisbal
Gus Correa wrote: > Hi Jim, list > > 1) Your first question: > > I opened a thread on this list two months or so ago about a similar > situation: when OpenMPI would use/not use libnuma. > I asked a question very similar to your question about IB support, > and how the configure script would provi

Re: [OMPI users] Infiniband requirements

2009-06-25 Thread Jeff Squyres
On Jun 25, 2009, at 12:53 PM, Jim Kress wrote: Is it correct to assume that, when one is configuring openmpi v1.3.2 and if one leaves out the --with-openib=/dir from the ./configure command line, that InfiniBand support will NOT be built into openmpi v1.3.2? Then, if an Ethernet network i

Re: [OMPI users] Infiniband requirements

2009-06-25 Thread Gus Correa
Hi Jim, list 1) Your first question: I opened a thread on this list two months or so ago about a similar situation: when OpenMPI would use/not use libnuma. I asked a question very similar to your question about IB support, and how the configure script would provide it or not. Jeff answerer it, a

Re: [OMPI users] infiniband problem

2008-11-23 Thread Pavel Shamis (Pasha)
recommend you upgrade your Open MPI installation. v1.2.8 has a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be available "next month"... so watch for an announcement on that front. BTW OMPI 1.2.8 also will be available as part of OFED 1.4 that will be released in end of th

Re: [OMPI users] infiniband problem

2008-11-21 Thread Jeff Squyres
On Nov 20, 2008, at 4:16 PM, Michael Oevermann wrote: with a blank behind /machine. Anyway, your suggested options -mca btl openib,sm,self did help!!! The specific tip here is that on Linux, you want to use the openib BTL, not the udapl BTL. Specifying "--mca btl openib,sm,self" means t

Re: [OMPI users] infiniband problem

2008-11-20 Thread Tim Mattox
BTW - after you get more comfortable with your new-to-you cluster, I recommend you upgrade your Open MPI installation. v1.2.8 has a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be available "next month"... so watch for an announcement on that front. On Thu, Nov 20, 2008 at 3:16

Re: [OMPI users] infiniband problem

2008-11-20 Thread Michael Oevermann
Hi Ralph, that was indeed a typo, the command is of course /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile /home/sysgen/infiniband-mpi-test/machine /usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 with a blank behind /machine. Anyway, your suggested options -mca btl openi

Re: [OMPI users] infiniband problem

2008-11-20 Thread Ralph Castain
Your command line may have just come across with a typo, but something isn't right: -hostfile /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/ openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 That looks more like a path to a binary than a path to a hostfile. Is there a missing space or filenam

Re: [OMPI users] infiniband

2008-05-01 Thread Pavel Shamis (Pasha)
Another nice tools for ib monitoring. 1. perfquery (part of OFED), example of report: Port counters: Lid 12 port 1 PortSelect:..1 CounterSelect:...0x SymbolErrors:7836 LinkRecovers:255 LinkDowned:...

Re: [OMPI users] infiniband

2008-04-29 Thread Pavel Shamis (Pasha)
SLIM H.A. wrote: Is it possible to get information about the usage of hca ports similar to the result of the mx_endpoint_info command for Myrinet boards? The ibstat command gives information like this: Port 1: State: Active Physical state: LinkUp but does not say whether a job is actually usin

Re: [OMPI users] infiniband

2008-04-28 Thread Jeff Squyres
Open MPI does not register with HCAs / ports in a way visible through OFED command line tools, sorry... On Apr 27, 2008, at 11:19 AM, SLIM H.A. wrote: Is it possible to get information about the usage of hca ports similar to the result of the mx_endpoint_info command for Myrinet boards? Th

Re: [OMPI users] Infiniband - Any suggestions on "How can you prove to me that OpenMPI is using it?"

2006-12-21 Thread Michael John Hanby
than cheep gig-e). Thanks again. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Wednesday, December 20, 2006 10:01 PM To: Jeff Squyres Cc: Open MPI Users Subject: Re: [OMPI users] Infiniband - Any suggestions on &qu

Re: [OMPI users] Infiniband - Any suggestions on "How can you prove to me that OpenMPI is using it?"

2006-12-20 Thread Jeff Squyres
On Dec 20, 2006, at 7:04 PM, Jeff Squyres wrote: I've been asked by the owner of the cluster "How can you prove to me that this openmpi job is using the Infiniband network?" At first I thought a simple netstat -an on the compute nodes might tell me, however I don't see the Infiniband IP's in

Re: [OMPI users] Infiniband - Any suggestions on "How can you proveto me that OpenMPI is using it?"

2006-12-20 Thread Andrew J Caird
You can also usually watch the counters on the IB cards and Ethernet cards. For programs that have a lot of communication between nodes it is quickly obvious which network you're using. The IB card monitoring is driver specific, but you should have some tools for this. For Ethernet you can

Re: [OMPI users] Infiniband - Any suggestions on "How can you prove to me that OpenMPI is using it?"

2006-12-20 Thread Jeff Squyres
On Dec 20, 2006, at 6:28 PM, Michael John Hanby wrote: Howdy, I'm new to cluster administration, MPI and high speed networks. I've compiled my OpenMPI using these settings: ./configure CC='icc' CXX='icpc' FC='ifort' F77='ifort' --with-mvapi=/usr/local/topspin --with-mvapi-libdir=/usr/local/top