> On May 29, 2018, at 10:30 PM, Kaiming Ouyang wrote:
>
> I have a question about recompiling openmpi.
> Recently I updated the infiniband driver for network card Mellanox, but I
> found original openmpi did not work anymore. Does this mean the driver update
> must be followed by recompiling op
Dear Yann
Here is the output
*[root@compute-01-01 ~]# cat /etc/redhat-release*
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
*[root@compute-01-01 ~]# uname -a*
Linux compute-01-01.private.dns.zone 2.6.18-128.el5 #1 SMP Wed Dec 17
11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
*[root@com
Le mercredi 19 décembre 2012 à 12:12 +0500, Syed Ahsan Ali a écrit :
> Dear John
>
> I found this output of ibstatus on some nodes (most probably the
> problem causing)
> [root@compute-01-08 ~]# ibstatus
>
> Fatal error: device '*': sys files not found
> (/sys/class/infiniband/*/ports)
>
> Do
Seems like driver was not started. I would suggest to run lspci and check if
the HCA is visible on HW level.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Dec 19, 2012, at 2:12 AM, Syed Ahsan Ali wrote:
Dear Joh
Dear John
I found this output of ibstatus on some nodes (most probably the problem
causing)
[root@compute-01-08 ~]# ibstatus
Fatal error: device '*': sys files not found
(/sys/class/infiniband/*/ports)
Does this show any hardware or software issue?
Thanks
On Wed, Nov 28, 2012 at 3:17 PM, Jo
I am not sure about drivers because those were installed by someone else
during cluster setup. I see following information about infiniband card.
The card is DDR InfiniBand Mellanox ConnectX.
On Wed, Nov 28, 2012 at 3:17 PM, John Hearns wrote:
> Those diagnostics are from Openfabrics.
> What ty
Those diagnostics are from Openfabrics.
What type of infiniband card do you have?
What drivers are you using?
ibstats comes with some other distribution? I don't have this command
available right now
On Wed, Nov 28, 2012 at 1:14 PM, John Hearns wrote:
> Short answer. Run ibstats or ibstatus.
> Look also at the logs of your subnet manager.
>
> ___
> users mail
Short answer. Run ibstats or ibstatus.
Look also at the logs of your subnet manager.
Cc: OpenMPI Users
Sent: Monday, 10 September 2012 9:11 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
Randolph,
So what you saying in short, leaving all the numbers aside, is the following:
In your particular application on your particular setup with this particular
Yevgeny Kliteynik
> *To:* Randolph Pullen
> *Cc:* OpenMPI Users
> *Sent:* Sunday, 9 September 2012 6:18 PM
> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling
>
> Randolph,
>
> O
See my comments in line...
From: Yevgeny Kliteynik
To: Randolph Pullen
Cc: OpenMPI Users
Sent: Sunday, 9 September 2012 6:18 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
Randolph,
On 9/7/2012 7:43 AM, Randolph Pullen wrote
Randolph,
On 9/7/2012 7:43 AM, Randolph Pullen wrote:
> Yevgeny,
> The ibstat results:
> CA 'mthca0'
> CA type: MT25208 (MT23108 compat mode)
What you have is InfiniHost III HCA, which is 4x SDR card.
This card has theoretical peak of 10 Gb/s, which is 1GB/s in IB bit coding.
> And more interest
-
> *From:* Yevgeny Kliteynik
> *To:* Randolph Pullen ; Open MPI Users
>
> *Sent:* Sunday, 2 September 2012 10:54 PM
> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling
>
> Randolp
2012 6:03 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
On 9/3/2012 4:14 AM, Randolph Pullen wrote:
> No RoCE, Just native IB with TCP over the top.
Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card".
Could you run "ibstat&qu
---
> *From:* Yevgeny Kliteynik
> *To:* Randolph Pullen ; Open MPI Users
>
> *Sent:* Sunday, 2 September 2012 10:54 PM
> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling
>
> Randolph,
>
> Some clarification on the setup:
>
> &q
ay, 2 September 2012 10:54 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
Randolph,
Some clarification on the setup:
"Melanox III HCA 10G
cards" - are those ConnectX 3 cards configured to Ethernet?
That is, when you're using openib BTL, you mean RoCE, right?
Randolph,
Some clarification on the setup:
"Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to Ethernet?
That is, when you're using openib BTL, you mean RoCE, right?
Also, have you had a chance to try some newer OMPI release?
Any 1.6.x would do.
-- YK
On 8/31/2012 10:53 AM,
- On occasions it seems to stall indefinately, waiting on a single receive.
Any ideas appreciated.
Thanks in advance,
Randolph
From: Randolph Pullen
To: Paul Kapinos ; Open MPI Users
Sent: Thursday, 30 August 2012 11:46 AM
Subject: Re: [OMPI users] Infin
64K and force short messages. Then the openib times are
the same as TCP and no faster.
I'ms till at a loss as to why...
From: Paul Kapinos
To: Randolph Pullen ; Open MPI Users
Sent: Tuesday, 28 August 2012 6:13 PM
Subject: Re: [OMPI users] Infin
Randolph,
after reading this:
On 08/28/12 04:26, Randolph Pullen wrote:
- On occasions it seems to stall indefinately, waiting on a single receive.
... I would make a blind guess: are you aware about IB card parameters for
registered memory?
http://www.open-mpi.org/faq/?category=openfabrics#
On Jul 31, 2012, at 12:14 AM, Joen Chen wrote:
> After reading the FAQ about OFED, I knew that openMPI can collaborate with
> RoCE.
Correct -- Open MPI can use RoCE interfaces, if they are available.
> Moreover, using the RoCE make some overhead because the underlying network
> layers. In my i
Jeremy,
As far as I understand the tool that Evgeny recommended showed that the remote
port is reachable.
Based on the log that have been provided I can't find the issue in ompi,
everything seems to be kosher.
Unfortunately, I do not have a platform where I may try to reproduce the issue.
I wo
Hi Pasha,
I just wanted to check if you had any further suggestions regarding
the APM issue based on the updated info in my previous email.
Thanks,
-Jeremy
On Mon, Mar 12, 2012 at 12:43 PM, Jeremy wrote:
> Hi Pasha, Yevgeny,
>
>>> My educated guess is that from some reason it is no direct conn
Hi Pasha, Yevgeny,
>> My educated guess is that from some reason it is no direct connection path
>> between lid-2 and lid-4. To prove it we have to look and the OpenSM routing
>> information.
> If you don't get response or you get info of
> the device different that what you would expect,
> then
Hi,
I just noticed that my previous mail bounced,
but it doesn't matter. Please ignore it if
you got it anyway - I re-read the thread and
there is a much simpler way to do it.
If you want to check whether LID L is reachable
through HCA H from port P, you can run this command:
smpquery --Ca H
On Thu, Mar 8, 2012 at 10:44 AM, Shamis, Pavel wrote:
> Jeremy,
> Finally I had a chance to look at log file.
Hi Pasha,
I appreciate the review you did and the comments you provided. I will
see if we can get some additional routing information. I will also do
some experiments with a more trivi
Jeremy,
Finally I had a chance to look at log file.
Initially all qps are created on port 1, and in the same time alternative path
loaded (ports 2, lids 4 and 2 ). I guess in some point you switch off port 1,
APM even is reported because the alternative path is active now, and from some
reason
Hi Pasha,
>On Wed, Feb 29, 2012 at 11:02 AM, Shamis, Pavel wrote:
>
> I would like to see all the file.
> 28MB is it the size after compression ?
>
> I think gmail supports up to 25Mb.
> You may try to create gzip file and then slice it using "split" command.
See attached. At about line 151311 i
>
>> On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote:
>> I reviewed the code and it seems to be ok :) The error should be reported if
>> the port migration is already happened once (port 1 to port 2), and now you
>> are trying to shutdown port 2 and MPI reports that it can't migrate anymo
Hi Pasha,
>On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote:
> I reviewed the code and it seems to be ok :) The error should be reported if
> the port migration is already happened once (port 1 to port 2), and now you
> are trying to shutdown port 2 and MPI reports that it can't migrate an
Jeremy,
I reviewed the code and it seems to be ok :) The error should be reported if
the port migration is already happened once (port 1 to port 2), and now you are
trying to shutdown port 2 and MPI reports that it can't migrate anymore. It
assumes that port 1 is still down and it can't go back
Hi Pasha,
Thanks for your response. I look forward to hearing from you when you
have a chance.
-Jeremy
On Wed, Feb 22, 2012 at 10:43 PM, Shamis, Pavel wrote:
> Jeremy,
> I implemented the APM support for openib btl a long time ago. I do not
> remember all the details of the implementation, but
Jeremy,
I implemented the APM support for openib btl a long time ago. I do not remember
all the details of the implementation, but I remember that it is used to
support LMC bits and multiple ib ports. Unfortunately I'm super busy this week.
I will try look at it early next week.
Pavel (Pasha) S
This means that you have some problem on that node,
and it's probably unrelated to Open MPI.
Bad cable? Bad port? FW/driver in some bad state?
Do other IB performance tests work OK on this node?
Try rebooting the node.
-- YK
On 12-Sep-11 7:52 AM, Ahsan Ali wrote:
> Hello all
>
> I am getting fol
Yevgeny,
Sorry for the delay in replying -- I'd been out for a few days.
- Original Message -
> From: Yevgeny Kliteynik
> Sent: Thursday, July 14, 2011 12:51 AM
> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
> While I'm try
On 11-Jul-11 5:23 PM, Bill Johnstone wrote:
> Hi Yevgeny and list,
>
> - Original Message -
>
>> From: Yevgeny Kliteynik
>
>> I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you.
>
> Thank you.
That's interesting...
This MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thingy imp
Hi Yevgeny and list,
- Original Message -
> From: Yevgeny Kliteynik
> I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you.
Thank you.
> One question though, just to make sure we're on the same page: so the jobs
> do run OK on
> the older HCAs, as long as they
Hi Bill,
On 08-Jul-11 7:59 PM, Bill Johnstone wrote:
> Hello, and thanks for the reply.
>
>
>
> - Original Message -
>> From: Jeff Squyres
>> Sent: Thursday, July 7, 2011 5:14 PM
>> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport
Hello, and thanks for the reply.
- Original Message -
> From: Jeff Squyres
> Sent: Thursday, July 7, 2011 5:14 PM
> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
>
> On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote:
>
>> I have
On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote:
> I have a heterogeneous network of InfiniBand-equipped hosts which are all
> connected to the same backbone switch, an older SDR 10 Gb/s unit.
>
> One set of nodes uses the Mellanox "ib_mthca" driver, while the other uses
> the "mlx4" driver.
On Friday 19 November 2010 01:03:35 HeeJin Kim wrote:
...
> * mlx4: There is a mismatch between the kernel and the userspace
> libraries: Kernel does not support XRC. Exiting.*
...
> What I'm thinking is that the infiniband card is installed but it doesn't
> work in correct mode.
> My linux kerne
On Nov 18, 2010, at 7:03 PM, HeeJin Kim wrote:
> I'm using Mellanox infiniband network card and trying to run it with openmpi.
> The problem is that I can connect and communicate between nodes, but I'm not
> sure whether it is in a correct state or not.
>
> I have two version of openmpi, one is
It would be best if an IB vendor replies (hint hint!), but it is likely that
you have some kind of hardware issue on that node (e.g., a bad / flakey HCA,
etc.). You should probably run a full set of layer-0 diagnostics on your
fabric to make sure it's clean.
I say this because back when Cisco
Yep -- it's normal.
Those IP addresses are used for bootstrapping/startup, not for MPI traffic. In
particular, that "HNP URI" stuff is used by Open MPI's underlying run-time
environment. It's not used by the MPI layer at all.
On Feb 5, 2010, at 2:32 PM, Mike Hanby wrote:
> Howdy,
>
> When
Correct, you don't need DAPL. Can you send all the information listed
here:
http://www.open-mpi.org/community/help/
On Sep 17, 2009, at 9:17 AM, Yann JOBIC wrote:
Hi,
I'm new to infiniband.
I installed the rdma_cm, rdma_ucm and ib_uverbs kernel modules.
When i'm running a ring test o
Gus Correa wrote:
> Hi Jim, list
>
> 1) Your first question:
>
> I opened a thread on this list two months or so ago about a similar
> situation: when OpenMPI would use/not use libnuma.
> I asked a question very similar to your question about IB support,
> and how the configure script would provi
On Jun 25, 2009, at 12:53 PM, Jim Kress wrote:
Is it correct to assume that, when one is configuring openmpi v1.3.2
and if
one leaves out the
--with-openib=/dir
from the ./configure command line, that InfiniBand support will NOT
be built
into openmpi v1.3.2? Then, if an Ethernet network i
Hi Jim, list
1) Your first question:
I opened a thread on this list two months or so ago about a similar
situation: when OpenMPI would use/not use libnuma.
I asked a question very similar to your question about IB support,
and how the configure script would provide it or not.
Jeff answerer it, a
recommend you upgrade your Open MPI installation. v1.2.8 has
a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be
available "next month"... so watch for an announcement on that front.
BTW OMPI 1.2.8 also will be available as part of OFED 1.4 that will be
released in end of th
On Nov 20, 2008, at 4:16 PM, Michael Oevermann wrote:
with a blank behind /machine. Anyway, your suggested options -mca
btl openib,sm,self
did help!!!
The specific tip here is that on Linux, you want to use the openib
BTL, not the udapl BTL. Specifying "--mca btl openib,sm,self" means
t
BTW - after you get more comfortable with your new-to-you cluster, I
recommend you upgrade your Open MPI installation. v1.2.8 has
a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be
available "next month"... so watch for an announcement on that front.
On Thu, Nov 20, 2008 at 3:16
Hi Ralph,
that was indeed a typo, the command is of course
/usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
/home/sysgen/infiniband-mpi-test/machine
/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
with a blank behind /machine. Anyway, your suggested options -mca btl
openi
Your command line may have just come across with a typo, but something
isn't right:
-hostfile /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/
openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
That looks more like a path to a binary than a path to a hostfile. Is
there a missing space or filenam
Another nice tools for ib monitoring.
1. perfquery (part of OFED), example of report:
Port counters: Lid 12 port 1
PortSelect:..1
CounterSelect:...0x
SymbolErrors:7836
LinkRecovers:255
LinkDowned:...
SLIM H.A. wrote:
Is it possible to get information about the usage of hca ports similar
to the result of the mx_endpoint_info command for Myrinet boards?
The ibstat command gives information like this:
Port 1:
State: Active
Physical state: LinkUp
but does not say whether a job is actually usin
Open MPI does not register with HCAs / ports in a way visible through
OFED command line tools, sorry...
On Apr 27, 2008, at 11:19 AM, SLIM H.A. wrote:
Is it possible to get information about the usage of hca ports similar
to the result of the mx_endpoint_info command for Myrinet boards?
Th
than
cheep gig-e).
Thanks again.
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Wednesday, December 20, 2006 10:01 PM
To: Jeff Squyres
Cc: Open MPI Users
Subject: Re: [OMPI users] Infiniband - Any suggestions on &qu
On Dec 20, 2006, at 7:04 PM, Jeff Squyres wrote:
I've been asked by the owner of the cluster "How can you prove to me
that this openmpi job is using the Infiniband network?"
At first I thought a simple netstat -an on the compute nodes might
tell
me, however I don't see the Infiniband IP's in
You can also usually watch the counters on the IB cards and
Ethernet cards. For programs that have a lot of communication
between nodes it is quickly obvious which network you're using.
The IB card monitoring is driver specific, but you should have
some tools for this. For Ethernet you can
On Dec 20, 2006, at 6:28 PM, Michael John Hanby wrote:
Howdy, I'm new to cluster administration, MPI and high speed networks.
I've compiled my OpenMPI using these settings:
./configure CC='icc' CXX='icpc' FC='ifort' F77='ifort'
--with-mvapi=/usr/local/topspin
--with-mvapi-libdir=/usr/local/top
61 matches
Mail list logo