Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-05 Thread Pavel Shamis
Posting this on UCX list. On Thu, Oct 4, 2018 at 4:42 PM Charles A Taylor wrote: > > We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, > for that matter) built with UCX support. The leak shows up > whether the “ucx” PML is specified for the run or not. The applications

Re: [OMPI users] OpenMPI 3.1.2: Run-time failure in UCX PML

2018-09-21 Thread Pavel Shamis
I would suggest to post the error in UCX issues - https://github.com/openucx/ucx/issues It is typical IB error complaining about an access to unregistered memory. Usually it caused by some pointer corruption in OMPI/UCX or application code. Best, Pasha On Thu, Sep 20, 2018 at 11:22 PM Ben Menad

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Pavel Shamis
You just have to switch PML to UCX. You have some example of the command line here: https://github.com/openucx/ucx/wiki/OpenMPI-and-OpenSHMEM-installation-with-UCX Best, P. On Thu, Jun 14, 2018 at 3:25 PM Charles A Taylor wrote: > Hmmm. ompi_info only shows the ucx pml. I don’t see any “trans

Re: [OMPI users] UCX and multithreading

2018-04-22 Thread Pavel Shamis
Yossi, can you please comment. Thanks, P. On Tue, Apr 17, 2018 at 07:24 marcin.krotkiewski < marcin.krotkiew...@gmail.com> wrote: > Hi, all, > > I'm reading in the changelog 3.0.0 that > > - Use UCX multi-threaded API in the UCX PML. Requires UCX 1.0 or later. > > Also, the changelog for 3.1.0 i

Re: [OMPI users] [OMPI USERS] Cross-compiling

2018-01-03 Thread Pavel Shamis
Alberto, Have you tried the toolchain from linaro ? https://releases.linaro.org/components/toolchain/binaries/latest/aarch64-linux-gnu/ Best, Pasha On Tue, Nov 14, 2017 at 4:07 AM, Alberto Ortiz wrote: > Hi, > I am trying to run in this type of environment: > > 1- A linux PC in which I intend

Re: [OMPI users] Error building openmpi on Raspberry pi 2

2017-10-03 Thread Pavel Shamis
I'm building on ARMv8 (64bit kernel, ompi master) and so far no problems. On Wed, Sep 27, 2017 at 7:34 AM, Jeff Layton wrote: > I could never get OpenMPI < 2.x to build on a Pi 2. I ended up using the > binary from the repos. Pi 3 is a different matter - I got that to build > after a little expe

Re: [OMPI users] Bad Infiniband latency with subounce

2010-02-18 Thread Pavel Shamis (Pasha)
ll. I just downloaded the OSU benchmarks and tried osu_latency It's report ~40 microsecs for OpenMPI, and ~3 micrcosecs for MVAPICH. Still puzzled... Steve -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Pavel Shamis (Pasha)

Re: [OMPI users] Bad Infiniband latency with subounce

2010-02-18 Thread Pavel Shamis (Pasha)
Hey, I only may to add the XRC and RC have the same latency. What is the command line that you use to run this benchmark ? What is the system configuration (one hca, one active port ) ? Any addition information about system configuration, mpi command line, etc. will help to analyze your issue.

Re: [OMPI users] [btl_openib_component.c:1373:btl_openib_component_progress] error polling HP CQ with -2 errno says Success

2009-09-26 Thread Pavel Shamis (Pasha)
Very strange. MPI tries to access CQ context and it get immediate error. Please make sure that you limits configuration is ok, take a look on this FAQ - http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Pasha. Charles Wright wrote: Hello, I just got some new cluster hardwa

Re: [OMPI users] running open mpi on ubuntu 9.04

2009-09-21 Thread Pavel Shamis (Pasha)
You will not be need the trick if you will configure Open Mpi with follow flag: --enable-mpirun-prefix-by-default Pasha. Hodgess, Erin wrote: the LD_LIBRARY_PATH did the trick; thanks so much! Sincerely, Erin Erin M. Hodgess, PhD Associate Professor Department of Computer and Mathematica

Re: [OMPI users] Job fails after hours of running on a specific node

2009-09-21 Thread Pavel Shamis (Pasha)
Sangamesh, The ib tunings that you added to your command line only delay the problem but doesn't resolve it. The node-0-2.local gets asynchronous event "IBV_EVENT_PORT_ERROR" and as result the processes fails to deliver packets to some remote hosts and as result you see bunch of IB errors. T

Re: [OMPI users] RETRY EXCEEDED ERROR status number 12

2009-08-21 Thread Pavel Shamis (Pasha)
You may try to use ibdiagnet tool: http://linux.die.net/man/1/ibdiagnet The tool is part of OFED (http://www.openfabrics.org/) Pasha. Prentice Bisbal wrote: Several jobs on my cluster just died with the error below. Are there any IB/Open MPI diagnostics I should use to diagnose, should I just

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Pavel Shamis (Pasha)
However, setting: -mca btl_openib_eager_limit 65536 gave a 15% improvement so OpenMPI is now down to 326 (from previous 376 seconds). Still a lot more than ScaliMPI with 214 seconds. Can you please run ibv_devinfo on one of compute nodes ? It is interesting to know what kind of IB HW you have

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Pavel Shamis (Pasha)
If the above doesn't improve anything the next question is do you know what the sizes of the messages are? For very small messages I believe Scali shows a 2x better performance than Intel and OMPI (I think this is due to a fastpath optimization). I remember that mvapich was faster that sca

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Pavel Shamis (Pasha)
the following MPI functions being used: MPI_Init MPI_wtime MPI_COMM_RANK MPI_COMM_SIZE MPI_BUFFER_ATTACH MPI_BSEND MPI_PACK MPI_UNPACK MPI_PROBE MPI_GET_COUNT MPI_RECV MPI_IPROBE MPI_FINALIZE where MPI_IPROBE is the clear winner in terms of number of calls. /Torgny Pavel Shamis (Pasha) wrote

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Pavel Shamis (Pasha)
Do you know if the application use some collective operations ? Thanks Pasha Torgny Faxen wrote: Hello, we are seeing a large difference in performance for some applications depending on what MPI is being used. Attached are performance numbers and oprofile output (first 30 lines) from one

Re: [OMPI users] Tuned collectives: How to choose them dynamically? (-mca coll_tuned_dynamic_rules_filename dyn_rules)"

2009-08-04 Thread Pavel Shamis (Pasha)
Lenny, You can find some details here: http://icl.cs.utk.edu/news_pub/submissions/Flex-collective-euro-pvmmpi-2006.pdf Pasha Lenny Verkhovsky wrote: Hi, I am looking too for a file example of rules for dynamic collectives, Have anybody tried it ? Where can I find a proper syntax for it ? tha

Re: [OMPI users] Using dual infiniband HCA cards

2009-07-30 Thread Pavel Shamis (Pasha)
We have a computational cluster which is consisting of 8 HP Proliant ML370G5 with 32GB ram. Each node has a Melanox single port infiniband DDR HCA card (20Gbit/s) and connected each other through a Voltaire ISR9024D-M DDR infiniband switch. Now we want to increase the bandwidth to 40GBit/s

Re: [OMPI users] [OMPI devel] selectively bind MPI to one HCA out of available ones

2009-07-16 Thread Pavel Shamis (Pasha)
Hi, You can select ib device used with openib btl by using follow parametres: MCA btl: parameter "btl_openib_if_include" (current value: , data source: default value) Comma-delimited list of devices/ports to be used (e.g. "mthca0,mthca1:2"; empty value means to

Re: [OMPI users] 50% performance reduction due to OpenMPI v 1.3.2 forcing all MPI traffic over Ethernet instead of using Infiniband

2009-06-23 Thread Pavel Shamis (Pasha)
Jim, Can you please share with us you mca conf file. Pasha. Jim Kress ORG wrote: For the app I am using, ORCA (a Quantum Chemistry program), when it was compiled using openMPI 1.2.8 and run under 1.2.8 with the following in the openmpi-mca-params.conf file: btl=self,openib the app ran fine wit

Re: [OMPI users] scaling problem with openmpi

2009-05-21 Thread Pavel Shamis (Pasha)
I tried to run with the first dynamic rules file that Pavel proposed and it works, the time per one MD step on 48 cores decreased from 2.8 s to 1.8 s as expected. Good news :-) Pasha. Thanks Roman On Wed, May 20, 2009 at 7:18 PM, Pavel Shamis (Pasha) wrote: Tomorrow I will add

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Pavel Shamis (Pasha)
Tomorrow I will add some printf to collective code and check what really happens there... Pasha Peter Kjellstrom wrote: On Wednesday 20 May 2009, Pavel Shamis (Pasha) wrote: Disabling basic_linear seems like a good idea but your config file sets the cut-off at 128 Bytes for 64-ranks (the

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Pavel Shamis (Pasha)
Disabling basic_linear seems like a good idea but your config file sets the cut-off at 128 Bytes for 64-ranks (the field you set to 8192 seems to result in a message size of that value divided by the number of ranks). In my testing bruck seems to win clearly (at least for 64 ranks on my IB) u

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Pavel Shamis (Pasha)
The correct MCA parameters are the following: -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_dynamic_rules_filename ./dyn_rules Ohh..it was my mistake You can also run the following command: ompi_info -mca coll_tuned_use_dynamic_rules 1 -param coll tuned This will give some insight

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Pavel Shamis (Pasha)
Default algorithm thresholds in mvapich are different from ompi. Using tunned collectives in Open MPI you may configure the Open MPI Alltoall threshold as Mvapich defaults. The follow mca parameters configure Open MPI to use custom rules that are defined in configure(txt) file. "--mca use_dynam

Re: [OMPI users] scaling problem with openmpi

2009-05-18 Thread Pavel Shamis (Pasha)
. Pasha. Roman Martonak wrote: I've been using --mca mpi_paffinity_alone 1 in all simulations. Concerning "-mca mpi_leave_pinned 1", I tried it with openmpi 1.2.X versions and it makes no difference. Best regards Roman On Mon, May 18, 2009 at 4:57 PM, Pavel Shamis (Pasha) wrot

Re: [OMPI users] scaling problem with openmpi

2009-05-18 Thread Pavel Shamis (Pasha)
1) I was told to add "-mca mpi_leave_pinned 0" to avoid problems with Infinband. This was with OpenMPI 1.3.1. Not Actually for 1.2.X version I will recommend you to enable leave pinned "-mca mpi_leave_pinned 1" sure if the problems were fixed on 1.3.2, but I am hanging on to that setting j

Re: [OMPI users] Problems with "error polling LP CQ with status RNR"

2009-05-14 Thread Pavel Shamis (Pasha)
RNR , receive is not ready - It means that on recv side MPI don't have buffers to get the data. It may point to some broken configuration in MPI/ofud or credit leak in OFUD code. Åke Sandgren wrote: Hi! I'm having problem with getting the "error polling LP CQ with status RNR..." on an otherw

Re: [OMPI users] Slightly off topic: Ethernet and InfiniBand speed evolution

2009-05-07 Thread Pavel Shamis (Pasha)
The (low level verbs) latency has AFAIR changed only a few times: 1) started at 5-6us with PCI-X Infinihost3 2) dropped to 3-4us with PCI-express Infinihost3 3) dropped to ~1us with PCI-express ConnectX I would like to add that on PCI-EX Gen2 platforms the latency is sub micro (~0.8-0.95)

Re: [OMPI users] Slightly off topic: Ethernet and InfiniBand speed evolution

2009-05-05 Thread Pavel Shamis (Pasha)
I can't find a similar data set for Infiniband. I would appreciate any comment/links. Here is IB roadmap http://www.infinibandta.org/itinfo/IB_roadmap ...But I do not see there SDR Pasha

Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-05 Thread Pavel Shamis (Pasha)
Jan, I guess that you have OFED driver installed on you machines. You may do basic network verification with ibdiagnet utility (http://linux.die.net/man/1/ibdiagnet) that is part of OFED installation. Regards, Pasha Jeff Squyres wrote: On May 4, 2009, at 9:50 AM, jan wrote: Thank you Jef

Re: [OMPI users] [Fwd: mpi alltoall memory requirement]

2009-04-26 Thread Pavel Shamis (Pasha)
You may try to use XRC, it should decrease openib btl memory footprint, especially on multi-core system, like you have. The follow command will switch default OMPI config to XRC: " --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32" Regard

Re: [OMPI users] mlx4 error - looking for guidance

2009-03-05 Thread Pavel Shamis (Pasha)
The fw version 2.3.0 is too old. I recommend you to upgrade to the latest version (2.6.0) from Mellanox website http://www.mellanox.com/content/pages.php?pg=firmware_table_ConnectXIB Thanks, Pasha Jeff Layton wrote: Oops. I ran it on the head node and not the compute node. Here is the output

Re: [OMPI users] mlx4 error - looking for guidance

2009-03-05 Thread Pavel Shamis (Pasha)
Do you have the same HCA adapter type on all of your machines ? In the error log I see mlx4 error message , and mlx4 is connectX driver, but ibv_devinfo show some older hca. Pasha Jeff Layton wrote: Pasha, Here you go... :) Thanks for looking at this. Jeff hca_id: mthca0 fw_ver:

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-05 Thread Pavel Shamis (Pasha)
Thanks Pasha! ibdiagnet reports the following: -I--- -I- IPoIB Subnets Check -I--- -I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Port localhost/P1 lid=0x00e2 guid=

Re: [OMPI users] mlx4 error - looking for guidance

2009-03-05 Thread Pavel Shamis (Pasha)
Jeff, Can you please provide more information about you HCA type (ibv_devinfo -v). Do you see this error immediate during startup, or you get it during your run ? Thanks, Pasha Jeff Layton wrote: Evening everyone, I'm running a CFD code on IB and I've encountered an error I'm not sure about

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-05 Thread Pavel Shamis (Pasha)
Time to dig up diagnostics tools and look at port statistics. You may use ibdiagnet tool for the network debug - *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED. Pasha.

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Pavel Shamis (Pasha)
Usually "retry exceeded error" points to some network issues, like bad cable or some bad connector. You may use ibdiagnet tool for the network debug - *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED. Pasha Brett Pemberton wrote: Hey, I've had a couple of errors recently, of

Re: [OMPI users] BTL question

2008-12-29 Thread Pavel Shamis (Pasha)
You may specify: --mca btl openib,sm,self Application sometime runs fast, sometimes runs slow When you specify the parameter above, open mpi will use only three btls openib - for Infiniband sm - for shared memory communication self - for "self" communication NO other btl will be used. And Op

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-28 Thread Pavel Shamis (Pasha)
Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to use to turn off pml_ob1_use_

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Pavel Shamis (Pasha)
If the basic test run the installation is ok. So what happens when you try to run your application ? What is command line ? What is the error message ? do you run the application on the same set of machines with the same command line as IMB ? Pasha yes to both questions: the OMPI version is

Re: [OMPI users] BTL question

2008-12-24 Thread Pavel Shamis (Pasha)
Teige, Scott W wrote: Greetings, I have observed strange behavior with an application running with OpenMPI 1.2.8, OFED 1.2. The application runs in two "modes", fast and slow. The exectution time is either within one second of 108 sec. or within one second of 67 sec. My cluster has 1 Gig etherne

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Pavel Shamis (Pasha)
Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. The o

Re: [OMPI users] infiniband problem

2008-11-23 Thread Pavel Shamis (Pasha)
recommend you upgrade your Open MPI installation. v1.2.8 has a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be available "next month"... so watch for an announcement on that front. BTW OMPI 1.2.8 also will be available as part of OFED 1.4 that will be released in end of th

Re: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)
_ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- -- Pavel Shamis (Pasha) Mellanox Technologies LTD.

Re: [OMPI users] Problem with btl_openib_endpoint_post_rr

2008-08-26 Thread Pavel Shamis (Pasha)
Hi, Can you please provide more information about your setup: - OpenMPI version - Runtime tuning - Platform - IB vendor and driver version Thanks, Pasha Åke Sandgren wrote: Hi! We have a code that (at least sometimes) gets the following error message: [p-bc2909][0,1,98][btl_openib_endpoint.h:2

Re: [OMPI users] Fail to install openmpi 1.2.5 on bladecenter with OFED 1.3

2008-08-13 Thread Pavel Shamis (Pasha)
Usually OFED installs only 64 bit version of libibverbs. If you want to install 32bit and 64bit version you need pass "--build32" flag to OFED install. So after reinstalling OFED with 32bit support, you may rebuild the OMPI for 32 bit support. Regards, Pasha Mohd Radzi Nurul Azri wrote: Hi

Re: [OMPI users] How can I start building apps in Open MPI? any docs?

2008-07-27 Thread Pavel Shamis (Pasha)
Amir Saad wrote: I'll be starting some parallel programs in Open MPI and I would like to find a guide or any docs of Open MPI, any suggestions please? I couldn't find any docs on the website, how do I know about the APIs or the functions that I should use? Here are videos about OpenMPI/MPI -

Re: [OMPI users] OpenMPI locking up only on IB

2008-07-03 Thread Pavel Shamis (Pasha)
with Intel and Pgi compilers: http://www.mellanox.com/products/ofed.php Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jul 3, 2008, at 8:38 AM, Jeff Squyres wrote: On Jul 2, 2008, at 11:51 PM, Pavel Shamis (Pasha) wrote: In trying to build

Re: [OMPI users] OpenMPI locking up only on IB

2008-07-03 Thread Pavel Shamis (Pasha)
there a way to shut off early completion in 1.2.3? Sure, just add "--mca |pml_ob1_use_early_completion 0" to your command line.| || Or the the above a known issues and i should use 1.2.7-pre or grab a 1.3 snap shot? 1.2.6 should be ok. Regards, Pasha On Jul 2, 2008, at 10:42 AM, Pa

Re: [OMPI users] OpenMPI locking up only on IB

2008-07-02 Thread Pavel Shamis (Pasha)
May be this FAQ will help : http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Brock Palen wrote: We have a code (arts) that locks up only when running on IB. Works fine on tcp and sm. When we ran it in a debugger. It locked up on a MPI_Comm_split() That as far a

Re: [OMPI users] Fw: Re: Open MPI timeout problems.

2008-06-19 Thread Pavel Shamis (Pasha)
hu, 6/19/08, Pavel Shamis (Pasha) //* wrote: From: Pavel Shamis (Pasha) Subject: Re: [OMPI users] Open MPI timeout problems. To: pj...@cornell.edu, "Open MPI Users" Date: Thursday, June 19, 2008, 5:20 AM Usually the retry exceed point to some network issue on y

Re: [OMPI users] Open MPI timeout problems.

2008-06-19 Thread Pavel Shamis (Pasha)
Usually the retry exceed point to some network issue on your cluster. I see from the logs that you still use MVAPI. If i remember correct, MVAPI include IBADM application that should be able to check and debug the network. BTW I recommend you to update your MVAPI driver to latest OpenFabric dri

Re: [OMPI users] OpenMPI scaling > 512 cores

2008-06-04 Thread Pavel Shamis (Pasha)
Scott Shaw wrote: Hi, I hope this is the right forum for my questions. I am running into a problem when scaling >512 cores on a infiniband cluster which has 14,336 cores. I am new to openmpi and trying to figure out the right -mca options to pass to avoid the "mca_oob_tcp_peer_complete_connect:

Re: [OMPI users] infiniband

2008-05-01 Thread Pavel Shamis (Pasha)
1 5 123391 Pavel Shamis (Pasha) wrote: SLIM H.A. wrote: Is it possible to get information about the usage of hca ports similar to the result of the mx_endpoint_info command for Myrinet boards? The ibstat command gives information like this: Port 1: State: Active Physical state: LinkUp

Re: [OMPI users] infiniband

2008-04-29 Thread Pavel Shamis (Pasha)
using an infiniband port or comunicates through plain ethernet. I would be grateful for any advice You have access to some counters in /sys/class/infiniband/mlx4_0/ports/1/counters/ (counters for hca - mlx4_0 , port 1) -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] multi-rail failover with IB

2008-04-03 Thread Pavel Shamis (Pasha)
d second one will be reserver for back-up. On network failure on the first port all connections will migrate to second port. The APM works only on the HCA level - I mean that you can not migrate between different HCAs, you can migrate only between 2 ports of the same HCA. -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] MPI-2 Supported on Open MPI 1.2.5?

2008-03-12 Thread Pavel Shamis (Pasha)
-- ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] Set GID

2008-03-12 Thread Pavel Shamis (Pasha)
Ok, I will do. Jeff Squyres wrote: Sure, that would be fine. Can you write it up in a little more FAQ-ish style? I can add it to the web page. See this wiki item: https://svn.open-mpi.org/trac/ompi/wiki/OMPIFAQEntries On Mar 12, 2008, at 5:33 AM, Pavel Shamis (Pasha) wrote: Run

Re: [OMPI users] Set GID

2008-03-12 Thread Pavel Shamis (Pasha)
above, but I cannot seem to find anywhere that will tell me how to change the GID to something else. Thanks, Jon ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI users] job running question

2006-04-10 Thread Pavel Shamis (Pasha)
Mpirun opens separate shell on each machine/node, so the "ulimit" will not be available in new sheel. I think if you will add "ulimit -c unlimited" to you default shell configuration file (~/.bashrc in BASH case ant ~/.tcshrc in TCSH/CSH case) you will find your core files