You have to check the ports states on *all* nodes in the run/job/submission.
Checking on a single node is not enough.
My guess is the 01-00 tries to connect 01-01 and the ports are down on 01-01.
You may disable support for infiniband by adding --mca btl ^openib.
Best,
Pavel (Pasha) Shamis
---
On Mon, Jul 21, 2014 at 6:57 PM, Shamis, Pavel
mailto:sham...@ornl.gov>> wrote:
You have to check the ports states on *all* nodes in the run/job/submission.
Checking on a single node is not enough.
My guess is the 01-00 tries to connect 01-01 and the ports are down on 01-01.
You may d
20 Gb/sec (4X DDR)
On Tue, Jul 22, 2014 at 6:50 PM, Shamis, Pavel
mailto:sham...@ornl.gov>> wrote:
Hmm, this does not make sense.
Your copy-n-paste shows that both machines (00 and 01) have the same guid/lid
(sort of equivalent of mac address in ethernet world).
As you can guess these
Hi Timur,
I don't think this is apples-to-apples comparison.
In OpenSHMEM world "MPI_waitall" would be mapped to shmem_quiet().
Even with this mapping, shmem_quiet() has a *stronger* completion semantics if
you compare it to MPI_waitall.
Quiet guarantees that the data was delivered to a remot
Terry,
Ishai Rabinovitz is HPC team manager (I added him to CC)
Eloi,
Back to issue. I have seen very similar issue long time ago on some hardware
platforms that support relaxed ordering memory operations. If I remember
correct it was some IBM platform.
Do you know if relaxed memory ordering is
The FW version looks ok. But it may be driver issues as well. I guess that OFED
1.4.X or 1.5.x driver should be ok.
To check driver version , you may run ofed_info command.
Regards,
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National
egards, Gilbert.
Le 7 janv. 11 à 16:14, Shamis, Pavel a écrit :
The FW version looks ok. But it may be driver issues as well. I guess that OFED
1.4.X or 1.5.x driver should be ok.
To check driver version , you may run ofed_info command.
Regards,
Pavel (Pasha) Shamis
---
Application Performance Tools
Please see the comments inline .
When I use cmd "mpirun -host LB270210,CB060106,CB060107 -np 3 --mca btl
openib,self,sm a_user.out" or "mpirun -host LB270210,CB060106,CB060107 -np 3
--mca btl openib,self,sm --mca btl_openib_cpc_include rdmacm a_user.out", will
print correct result, but use cmd
As far as I remember we don't allow to user to specify SL for RoCE. RoCE
considered kinda ethernet device and RDMACM connection manager is used to setup
the connections. it means that in order to select network X or Y, you may use
ip/netmask (btl_openib_ipaddr_include) .
Pavel (Pasha) Shamis
-
s wrote:
>> Random thought: is there a check to ensure that the SL MCA param is not set
>> in a RoCE environment? If not, we should probably add a show_help warning
>> if the SL MCA param is set when using RoCE (i.e., that its value will be
>> ignored).
>>
>>
Please see my comment below.
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Mar 11, 2011, at 2:47 PM, Bernardo F Costa wrote:
> I have found this thread form two years ago. I am some kind of lost on
> configuring
I would recommend you to read OFED (or Mellanox OFED) documentation. It will be
good start point.
Regards,
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Mar 14, 2011, at 4:37 PM, Bernardo F Costa wrote:
> O
You may find some initial XRC tuning documentation here :
https://svn.open-mpi.org/trac/ompi/ticket/1260
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Aug 1, 2011, at 11:41 AM, Yevgeny Kliteynik wrote:
> Hi,
You may try to update your OFED version. I think 1.5.3 is the latest one.
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Aug 25, 2011, at 7:46 PM, wrote:
>
> Hi all,
>
> it is more hardware or system confi
>
> * btl_openib_ib_retry_count - The number of times the sender will
> attempt to retry (defaulted to 7, the maximum value).
> * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
> to 10). The actual timeout value used is calculated as:
>
Actually I'm surprised that defaul
Alternative solution for the problem is updating your memory limits
Please see below:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Apparently you memory limit is low and the driver fails to create QPs
What happens when you add the following to your mpirun command?
-mca btl
You would recommend you to verify, that you don't have any bad cables or ports
in your IB network. You may to use one of OFA network check tools, for example
ibdiagnet.
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
You may find additional information here :
https://svn.open-mpi.org/trac/ompi/ticket/1900
Using the information there you may calculate actual memory consumption.
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On
Salvatore,
Please see my comment inline.
>
> More generally, in the case of a front-end nodes with a processors
> definitively different from
> worker nodes (same firm i.e.Intel) can openMPI applications compiled on one
> run correctly
> on the others?
It is possible to come up with a set of
Jeremy,
I implemented the APM support for openib btl a long time ago. I do not remember
all the details of the implementation, but I remember that it is used to
support LMC bits and multiple ib ports. Unfortunately I'm super busy this week.
I will try look at it early next week.
Pavel (Pasha) S
Laboratory
On Feb 23, 2012, at 9:00 PM, Jeremy wrote:
> Hi Pasha,
> Thanks for your response. I look forward to hearing from you when you
> have a chance.
>
> -Jeremy
>
> On Wed, Feb 22, 2012 at 10:43 PM, Shamis, Pavel wrote:
>> Jeremy,
>> I implemented the AP
>
>> On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote:
>> I reviewed the code and it seems to be ok :) The error should be reported if
>> the port migration is already happened once (port 1 to port 2), and now you
>> are trying to shutdown port 2 and MPI re
(Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Feb 29, 2012, at 5:38 PM, Jeremy wrote:
> Hi Pasha,
>
>> On Wed, Feb 29, 2012 at 11:02 AM, Shamis, Pavel wrote:
>>
>> I would like to see all
>
> Can anyone tell me whether it is legal to pass zero values for some of the
> send count elements in an MPI_AlltoallV() call? I want to perform an
> all-to-all operation but for performance reasons do not want to send data to
> various ranks who don't need to receive any useful values. If it
Jeremy,
As far as I understand the tool that Evgeny recommended showed that the remote
port is reachable.
Based on the log that have been provided I can't find the issue in ompi,
everything seems to be kosher.
Unfortunately, I do not have a platform where I may try to reproduce the issue.
I wo
I have solved this issue. All the paths were correct but I still had to use
mpirun -x LD_LIBRARY_PATH while executing the job.
Other option is to update your .bashrc/.cshrc.
Just add the LD_LIBRARY_PATH to the file and the update variable will be
available on remote machines as well.
(I ma
>
> Thanks a lot for the clarification. Do you know if this empirical tests
> has been published somehow/somewhere?
I think it was published in very early OSU/MVAPICH related papers.
Regards,
P.
>
> cheers, Simone
>
>>
>> On May 7, 2012, at 9:25 AM, Simone Pellegrini wrote:
>>
>>> Hello,
Another good reason for ummunotify kernel module
(http://lwn.net/Articles/345013/)
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Nov 8, 2012, at 9:08 AM, Jeff Squyres wrote:
On Nov 8, 2012, at 8:51 AM, Rolf vandeVaart
b/65188
On Nov 8, 2012, at 9:38 AM, Shamis, Pavel wrote:
Another good reason for ummunotify kernel module
(http://lwn.net/Articles/345013/)
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Nov 8, 2012, at 9:08 AM
;> (cuda), that equation might be changing, but I'll leave that to Nvidia to
>> push on Linus. :-)
>>
>> Something like ummunotify should be in the ibcore area in the kernel. And
>> at least initially, probably something like this should be in the cuda
>>
Sounds like self btl does not support CUDA IPC.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Dec 13, 2012, at 11:46 AM, Justin Luitjens wrote:
Thank you everyone for your responses. I believe I have narrowed do
Seems like driver was not started. I would suggest to run lspci and check if
the HCA is visible on HW level.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Dec 19, 2012, at 2:12 AM, Syed Ahsan Ali wrote:
Dear Joh
>
> Switching to SRQ and some guess of queue values selected appears to let the
> code run.
> S,4096,128:S,12288,128:S,65536,12
>
> Two questions,
>
> This is a ConnectX fabric, should I switch them to XRC queues? And should I
> use the same queue size/count? That a safe assumption?
> X,4096
> We just learned about MXM, and given most our cards are Mellonox ConnectX
> cards (though not all, have have islands of previous to ConnectX and Qlogic
> supported in the same OpenMPI environment),
>
> Will MXM correctly fail though to PSM if on qlogic gear and fail though to
> OpenIB if on p
ter scalability, you may move the threshold up. Bottom line you
have to experiment and see what is good for you :)
-Pasha
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Jan 22, 2013, at 2:58 PM, Sham
>
> You sound like our vendors, "what is your app"
;-) I used to be one.
Ideally OMPI should do the switch between MXM/RC/XRC internally in the
transport layer. Unfortunately,
we don't have such smart selection logic. Hopefully IB vendors will fix some
day.
>
> Note most of our users run
>>> You sound like our vendors, "what is your app"
>>
>> ;-) I used to be one.
>>
>> Ideally OMPI should do the switch between MXM/RC/XRC internally in the
>> transport layer. Unfortunately,
>> we don't have such smart selection logic. Hopefully IB vendors will fix some
>> day.
>
> I actua
> have a user whos code at scale dies reliably with the errors (new hosts each
> time):
>
> We have been using for this code:
> -mca btl_openib_receive_queues X,4096,128:X,12288,128:X,65536,12
>
> Without that option it dies with an out of memory message reliably.
>
> Note this code runs fine
Also make sure that processes were not swapped out to a hard drive.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Jan 27, 2013, at 6:39 AM, John Hearns
mailto:hear...@googlemail.com>> wrote:
2 percent?
Have yo
For each communicator OMPI manages a table of collective functions. In order to
pass the communicator initialization, the table should be filled with functions.
The "tuned" component doesn't provide all functions (at least it used to be
this way). In order fill the table
you should use tuned,basi
You may find more updated version on ofed website:
http://www.openfabrics.org/downloads/dapl/
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Feb 22, 2013, at 2:18 PM, Orion Poplawski wrote:
> Is the uDAPL projec
I was sure that this one was fixed as well.
I think this is some other problem, which related to the combination of XRC +
intel compiler.
Due to limited availability of the XRC system, I doubt the combination was
tested.
I'm trying to look at the issue, but first I have to find the setup...
Pa
Default implementation of collectives is based on PML (p2p layer) that is
implemented on top of BTL.
Consequently it laverages RDMA capabilities to some extend.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Jun 6
, etc. Some algorithms will require O(log(n)) connections other O(n).
Also on OpenIB btl level per rank we create multiple QPs , not just one. To
make things even more complicated :-) there are multiple types of QPs, like RC
and XRC.
-Pasha
--
Joba
On Thu, Jun 6, 2013 at 3:37 PM, Shamis
You may use tools like this http://linux.die.net/man/1/ibdiagnet
to debug your ib network problems. Most likely, you have some bad cable or
connector somewhere in the network.
The tool should be able to pin-point the problem.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Scien
Ben,
You may try to disable registration cache, it may relieve pressure on memory
resources.
--mca mpi_leave_pinned 0
You may find a bit more details here:
http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
Using the option you may observe drop in BW performance.
R
+1 for Chuck Norris
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Oct 23, 2013, at 1:12 PM, Mike Dubman
mailto:mi...@dev.mellanox.co.il>> wrote:
maybe to add some nice/funny slogan on the front under the logo, a
Hey Sid ;)
Please see inline.
1.1. What is the meaning of "phase 1" fragment?
[Shamis, Pavel] Phase one of the rendezvous (RNDV) protocol. Essentially, the
first request of the RNDV we send as an eager message of size N,
Receiver unpacks the messages and switches RNDV get or put
Did you try to run ibdiagnet to check the network ?
Also, how many devices you have on the same node ?
It say "mlx4_14" - do you have 14 HCA on the same machine ?!
Best,
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
rshall
mailto:john.marsh...@ssc-spc.gc.ca>>
Reply-To: Open Users mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Friday, October 16, 2015 2:16 PM
To: Open Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] openib issue with 1.6.5 but not later releases
On 10/
Please see inline (marked with "Pasha >").
From: users mailto:users-boun...@open-mpi.org>> on
behalf of John Marshall
mailto:john.marsh...@ssc-spc.gc.ca>>
Reply-To: Open Users mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Monday, October 19, 2015 11:06 AM
To: Open Users m
Please see inline.
(Liran, can you please comment)
>
>>
>> So, is this setting required if there are multiple IB interfaces (as
>> when there are multiple eth interfaces)? What is curious is that
>> there is only one interface visible from the container. Does the
>> openib btl look deeper and fi
Below is from ompi_info:
$ ompi_info --all | grep btl_openib_receive
MCA btl: parameter "btl_openib_receive_queues" (current value:
,
data source: default value)
This tunning does make sense for me.
#receive_queues = P,128,256,192,128:S,65536,256,192,128
Most likely it w
53 matches
Mail list logo