Re: [OMPI users] performance of MPI_Iallgatherv

2014-04-07 Thread Nathan Hjelm
There is no async progress in Open MPI at this time so this is the expected behavior. We plan to fix this for the 1.9 release series. -Nathan Hjelm HPC-5, LANL On Mon, Apr 07, 2014 at 11:12:06AM +0800, Zehan Cui wrote: > Hi Matthieu, > > Thanks for your suggestion. I tried MPI_Waita

Re: [OMPI users] OpenMPI with Gemini Interconnect

2014-04-16 Thread Nathan Hjelm
more info on the XE, XK, and XC support feel free to ask on this list and I will try to get an answer back quickly. -Nathan Hjelm HPC-5, LANL On Wed, Apr 16, 2014 at 05:01:37PM -0400, Ray Sheppard wrote: >Hello, > Big Red 2 provides its own MPICH based MPI. The only case whe

Re: [OMPI users] MPI one-sided communication questions

2014-04-24 Thread Nathan Hjelm
the pointer component > points to. That address I would then like to use for MPI_Put/MPI_Get > - without support of the remove side and, in particular, without > calling a collective on all all processes. Any idea how to do this? This is possible if the window was creates with MPI_Win_crea

Re: [OMPI users] Question about scheduler support

2014-05-15 Thread Nathan Hjelm
design and implement it. > > > Please allow me to chip in my $0.02 and suggest to not reinvent the wheel, > but instead consider to migrate the build system to cmake : Umm, no. IMHO, CMake has its own set of issues. So, its likely not going to happen. -Nathan Hjelm HPC-5, LANL pgpV2U7xXfd2R.pgp Description: PGP signature

Re: [OMPI users] divide-by-zero in mca_btl_openib_add_procs

2014-05-27 Thread Nathan Hjelm
On Wed, May 28, 2014 at 12:32:35AM +0200, Alain Miniussi wrote: > Unfortunately, the debug library works like a charm (which make the > uninitialized variable issue more likely). > > Still, the stack trace point to mca_btl_openib_add_procs in > ompi/mca/btl/openib/btl_openib.c and there is only on

Re: [OMPI users] Valgrind reports lots of memory leakage

2014-05-30 Thread Nathan Hjelm
We are aware of the problem and many of these leaks are already fixed in the trunk and 1.8.2 nightlies. -Nathan Hjelm HPC-5, LANL On Fri, May 30, 2014 at 12:19:15PM -0700, W Spector wrote: > Hi, > > I have been doing a lot of testing/fixing lately on our code, using valgrind > to f

Re: [OMPI users] Compiling OpenMPI 1.8.1 for Cray XC30

2014-06-09 Thread Nathan Hjelm
I have a platform file for the XC30 that I haven't yet pushed to the repository. I will try to push it later today. -Nathan On Thu, Jun 05, 2014 at 04:00:03PM +, Hammond, Simon David (-EXP) wrote: > Hi OpenMPI developers/users, > > Does anyone have a working configure line for OpenMPI 1.8.1

Re: [OMPI users] openib segfaults with Torque

2014-06-10 Thread Nathan Hjelm
On Tue, Jun 10, 2014 at 12:10:28AM +, Jeff Squyres (jsquyres) wrote: > I seem to recall that you have an IB-based cluster, right? > > From a *very quick* glance at the code, it looks like this might be a simple > incorrect-finalization issue. That is: > > - you run the job on a single serve

Re: [OMPI users] openib segfaults with Torque

2014-06-10 Thread Nathan Hjelm
been up? -Nathan Hjelm Application Readiness, HPC-5, LANL On Tue, Jun 10, 2014 at 02:06:54PM -0400, Fischer, Greg A. wrote: > Jeff/Nathan, > > I ran the following with my debug build of OpenMPI 1.8.1 - after opening a > terminal on a compute node with "qsub -l nodes 2 -I&qu

Re: [OMPI users] openib segfaults with Torque

2014-06-10 Thread Nathan Hjelm
Out of curiosity what is the mlock limit on your system? If it is too low that can cause ibv_create_cq to fail. To check run ulimit -m. -Nathan Hjelm Application Readiness, HPC-5, LANL On Tue, Jun 10, 2014 at 02:53:58PM -0400, Fischer, Greg A. wrote: > Yes, this fails on all nodes on the sys

Re: [OMPI users] openib segfaults with Torque

2014-06-11 Thread Nathan Hjelm
PM, "Fischer, Greg A." > wrote: > > > Is there any other work around that I might try? Something that > avoids UDCM? > > > > -Original Message- > > From: Fischer, Greg A. > > Sent: Tue

Re: [OMPI users] MPI_T Control Variables

2014-07-11 Thread Nathan Hjelm
Can you try with a 1.8.2 nightly tarball or the trunk? I fixed a couple of bugs that varlist discovered (also found some in varlist). -Nathan Hjelm HPC-5, LANL On Fri, Jul 11, 2014 at 04:42:01PM +, Gallardo, Esthela wrote: >Hi, > >I am new to the MPI_T interface, and was

Re: [OMPI users] MPI_T Control Variables

2014-07-11 Thread Nathan Hjelm
The current nightly tarball can be found at http://www.open-mpi.org/nightly/v1.8/openmpi-1.8.2a1r32209.tar.gz -Nathan Hjelm HPC-5, LANL On Fri, Jul 11, 2014 at 05:04:07PM +, Gallardo, Esthela wrote: > Hi Nathan, > > Where can I access the 1.8.2 tarball? I'm not sure if you me

Re: [OMPI users] MPI_T Control Variables

2014-07-11 Thread Nathan Hjelm
ou meant to include it > as an attachment. If so, then it did not go through. > > Thank you, > > Esthela Gallardo > ____ > From: users on behalf of Nathan Hjelm > > Sent: Friday, July 11, 2014 10:50 AM > To: Open MPI Users &g

Re: [OMPI users] MPI_T Control Variables

2014-07-11 Thread Nathan Hjelm
Ignore that. Their version is ok. The one I have looks like it is out of date. Just tested theirs with trunk. -Nathan On Fri, Jul 11, 2014 at 11:27:42AM -0600, Nathan Hjelm wrote: > > Hmm, looks like the varlist fixes I provided to LLNL haven't made it > into their git repo. Us

Re: [OMPI users] latest stable and win7/msvc2013

2014-07-16 Thread Nathan Hjelm
It likely won't build because last I check the Microsoft toolchain does not fit the minimum requirements (C99 or higher). You will have better luck with either gcc or intel's compiler. -Nathan On Wed, Jul 16, 2014 at 04:52:53PM +0100, MM wrote: > hello, > I'm about to try to build 1.8.1 with win

Re: [OMPI users] MPI_T Control Variables

2014-07-16 Thread Nathan Hjelm
-np 16 -hostfile hosts --mca btl openib,self ./varlist > > Is this correct? > > Thank you, > > Esthela Gallardo > ________ > From: users on behalf of Nathan Hjelm > > Sent: Friday, July 11, 2014 11:33 AM > To: Open MPI Us

Re: [OMPI users] problem with mca_pml_ob1.so in openmpi-1.8.2rc2

2014-07-25 Thread Nathan Hjelm
Can you try adding the #include to pml_ob1_isend.c And see if that resolves the issue? -Nathan On Fri, Jul 25, 2014 at 07:59:21AM +0200, Siegmar Gross wrote: > Hi, > > today I tried to track down the error which I reported for > my small program (running on Solaris 10 Sparc). > > tyr hello

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Nathan Hjelm
And it doesn't support knem at this time. Probably never will because of the existence of CMA. -Nathan On Thu, Oct 16, 2014 at 01:49:09PM -0700, Ralph Castain wrote: > FWIW: vader is the default in 1.8 > > On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller wrote: > > > Are you sure you are not u

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Nathan Hjelm
On Thu, Oct 16, 2014 at 05:27:54PM -0400, Gus Correa wrote: > Thank you, Aurelien! > > Aha, "vader btl", that is new to me! > I tought Vader was that man dressed in black in Star Wars, > Obi-Wan Kenobi's nemesis. > That was a while ago, my kids were children, > and Alec Guiness younger than Harris

Re: [OMPI users] large memory usage and hangs when preconnecting beyond 1000 cpus

2014-10-21 Thread Nathan Hjelm
would suggest the stack trace analysis tool (STAT). I might help you narrow down where the problem is occuring. -Nathan Hjelm HPC-5, LANL On Tue, Oct 21, 2014 at 01:12:21PM +1100, Marshall Ward wrote: > Thanks, it's at least good to know that the behaviour isn't normal! > > Co

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Nathan Hjelm
On Mon, Oct 27, 2014 at 02:15:45PM +, michael.rach...@dlr.de wrote: > Dear Gilles, > > This is the system response on the login node of cluster5: > > cluster5:~/dat> mpirun -np 1 df -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda31 228G 5.6G 211G 3% / > udev

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-30 Thread Nathan Hjelm
emory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy/ -Nathan Hjelm HPC-5, LANL On Fri, Oct 17, 2014 at 01:02:23PM -0700, Ralph Castain wrote: > On Oct 17, 2014, at 12:06 PM, Gus Correa wrote: > Hi Jeff > > Many thanks for looking into this and filing a bug

Re: [OMPI users] mmaped memory and openib btl.

2014-11-12 Thread Nathan Hjelm
You could just disable leave pinned: -mca mpi_leave_pinned 0 -mca mpi_leave_pinned_pipeline 0 This will fix the issue but may reduce performance. Not sure why the munmap wrapper is failing to execute but this will get you running. -Nathan Hjelm HPC-5, LANL On Wed, Nov 12, 2014 at 05:08:06PM

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Nathan Hjelm
One thing that changed between 1.6 and 1.8 is the default binding policy. Open MPI 1.6 did not bind by default but 1.8 binds to core. You can unset the binding policy by adding --bind-to none. -Nathan Hjelm HPC-5, LANL On Tue, Dec 09, 2014 at 12:14:32PM -0500, Eric Chamberland wrote: >

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Nathan Hjelm
yield when idle is broken on 1.8. Fixing now. -Nathan On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote: > Hmmm….well, it looks like we are doing the right thing and running unbound > when oversubscribed like this. I don’t have any brilliant idea why it would > be running so slowly

Re: [OMPI users] MPI_THREAD_MULTIPLE hang

2014-12-10 Thread Nathan Hjelm
Several things: - In 1.8.x only shared memory windows work with multiple threads. This problem will be fixed in the master branch soon. A back-port to 1.8 is unlikely given the magnitude of the changes. - I highly recommend using the MPI-3 call MPI_Win_allocate over MPI_Win_create. Th

Re: [OMPI users] Valgrind reports a plenty of Invalid read's in osc_rdma_data_move.c

2015-01-14 Thread Nathan Hjelm
Have you turned on valgrind support in Open MPI. That is required to quite these bogus warnings. -Nathan On Wed, Jan 14, 2015 at 10:17:50AM +, Victor Vysotskiy wrote: > Hi, > > Our parallel applications behaves strange when it is compiled with Openmpi > v1.8.4 on both Linux and Mac OS X p

Re: [OMPI users] Fail to lock/unlock a shared memory window iteratively

2015-02-12 Thread Nathan Hjelm
There was a bug in the MPI_MODE_NOCHECK path in osc/sm. It has been fixed on master and a fix has been CMRed to 1.8. Thank you for reporting this. In the meantime you can remove MPI_MODE_NOCHECK and it should work fine. -Nathan On Thu, Feb 12, 2015 at 11:10:59PM +0100, Thibaud Kloczko wrote: >

Re: [OMPI users] Help on getting CMA works

2015-02-18 Thread Nathan Hjelm
I recommend using vader for CMA. It has code to get around the ptrace setting. Run with mca_btl_vader_single_copy_mechanism cma (should be the default). -Nathan On Wed, Feb 18, 2015 at 02:56:01PM -0500, Eric Chamberland wrote: > Hi, > > I have configured with "--with-cma" on 2 differents OS (Re

Re: [OMPI users] Help on getting CMA works

2015-02-19 Thread Nathan Hjelm
> > On both RedHat 6.5 and OpenSuse 12.3 and still get the same error message!!! > :-/ > > Sorry, I am not a kernel expert... > > What's wrong? > > Thanks, > > Eric > > On 02/18/2015 04:48 PM, Éric Chamberland wrote: > > > >Le 2015-02-18 15

Re: [OMPI users] Help on getting CMA works

2015-02-19 Thread Nathan Hjelm
On Thu, Feb 19, 2015 at 12:16:49PM -0500, Eric Chamberland wrote: > > On 02/19/2015 11:56 AM, Nathan Hjelm wrote: > > > >If you have yama installed you can try: > > Nope, I do not have it installed... is it absolutely necessary? (and would > it change something w

Re: [OMPI users] Help on getting CMA works

2015-02-19 Thread Nathan Hjelm
, Eric Chamberland wrote: > On 02/19/2015 02:58 PM, Nathan Hjelm wrote: > >On Thu, Feb 19, 2015 at 12:16:49PM -0500, Eric Chamberland wrote: > >> > >>On 02/19/2015 11:56 AM, Nathan Hjelm wrote: > >>> > >>>If you have yama installed you can try:

Re: [OMPI users] Help on getting CMA works

2015-02-19 Thread Nathan Hjelm
el: +1 (865) 974-9375 fax: +1 (865) 974-8296 > https://icl.cs.utk.edu/~bouteill/ > > > > > > Le 19 févr. 2015 à 15:53, Nathan Hjelm a écrit : > > > > > > Great! I will add an MCA variable to force CMA and also enable it if 1) > > no yama and

Re: [OMPI users] Help on getting CMA works

2015-02-19 Thread Nathan Hjelm
Hmm, wait. Yes. Your change went in after 1.8.4 and has the same effect. If yama ins't installed it is safe to assume that the ptrace scope is effectively 0. So, your patch does fix the issue. -Nathan On Thu, Feb 19, 2015 at 02:53:47PM -0700, Nathan Hjelm wrote: > > I don't thi

Re: [OMPI users] Help on getting CMA works

2015-02-19 Thread Nathan Hjelm
Aurélien, I should also point out your fix has already been applied to the 1.8 branch and will be included in 1.8.5. -Nathan On Thu, Feb 19, 2015 at 02:57:38PM -0700, Nathan Hjelm wrote: > > Hmm, wait. Yes. Your change went in after 1.8.4 and has the same > effect. If yama ins'

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread Nathan Hjelm
Josh, do you see a hang when using vader? It is preferred over the old sm btl. -Nathan On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote: >Sachin, > >I am able to reproduce something funny. Looks like your issue. When I run >on a single host with two ranks, the test works

Re: [OMPI users] Help on getting CMA works

2015-02-24 Thread Nathan Hjelm
Eric Chamberland wrote: > Maybe it is a stupid question, but... why it is not tested and enabled by > default at configure time since it is part of the kernel? > > Eric > > > On 02/19/2015 03:53 PM, Nathan Hjelm wrote: > >Great! I will add an MCA variable to force CMA and also

Re: [OMPI users] Questions regarding xpmem

2015-03-16 Thread Nathan Hjelm
What program are you using for the benchmark? Are you using the xpmem branch in my github? For my testing I used a stock ubuntu 3.13 kernel but I have not full stress-tested my xpmem branch. I will see if I can reproduce and fix the hang. -Nathan On Mon, Mar 16, 2015 at 05:32:26PM +0100, Tobias

Re: [OMPI users] Questions regarding xpmem

2015-03-17 Thread Nathan Hjelm
9. >openmpi and pw was build with the intel compilers, xpmem with gcc. > >Kind regards, >Tobias > >On 03/16/2015 05:56 PM, Nathan Hjelm wrote: > > What program are you using for the benchmark? Are you using the xpmem > branch in my github? For my test

Re: [OMPI users] Questions regarding xpmem

2015-03-17 Thread Nathan Hjelm
t;Kind regards, >Tobias > >On 03/16/2015 05:56 PM, Nathan Hjelm wrote: > > What program are you using for the benchmark? Are you using the xpmem > branch in my github? For my testing I used a stock ubuntu 3.13 kernel > but I have not full stress-tested my xpm

Re: [OMPI users] Eager sending on InfiniBand

2016-05-16 Thread Nathan Hjelm
benefit to using per-peer queue pairs and they do not scale. -Nathan Hjelm HPC-ENV, LANL On Mon, May 16, 2016 at 12:21:41PM -0400, Xiaolong Cui wrote: >Hi, >I am using Open MPI 1.8.6. I guess my question is related to the flow >control algorithm for small messages. The question

Re: [OMPI users] Eager sending on InfiniBand

2016-05-17 Thread Nathan Hjelm
credits? >Best, >Michael > On Mon, May 16, 2016 at 6:35 PM, Nathan Hjelm wrote: > > When using eager_rdma the sender will block once it runs out of > "credits". If the receiver enters MPI for any reason the incoming > messages will be p

Re: [OMPI users] Eager sending on InfiniBand

2016-05-17 Thread Nathan Hjelm
s gone. But >removing the per-peer queue pair does not help. >Do you know any document that discusses the open mpi internals, especially >related to this problem? >On Tue, May 17, 2016 at 11:00 AM, Nathan Hjelm wrote: > > If it is blocking on the first message th

Re: [OMPI users] How to see the output from OPAL_OUTPUT_VERBOSE?

2016-05-22 Thread Nathan Hjelm
You use the *_base_verbose MCA variables. For example, if you want to see output from the btl use -mca btl_base_verbose x. The number x controls the verbosity level. Starting with 2.x are named levels but now many components conform to the names yet. In general components use use numbers between

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Nathan Hjelm
That message is coming from udcm in the openib btl. It indicates some sort of failure in the connection mechanism. It can happen if the listening thread no longer exists or is taking too long to process messages. -Nathan On Jun 14, 2016, at 12:20 PM, Ralph Castain wrote: Hmm…I’m unable to r

Re: [OMPI users] "failed to create queue pair" problem, but settings appear OK

2016-06-15 Thread Nathan Hjelm
You ran out of queue pairs. There is no way around this for larger all-to-all transfers when using the openib btl and SRQ. Need O(cores^2) QPs to fully connect with SRQ or PP QPs. I recommend using XRC instead by adding: btl_openib_receive_queues = X,4096,1024:X,12288,512:X,65536,512 to your o

Re: [OMPI users] "failed to create queue pair" problem, but settings appear OK

2016-06-15 Thread Nathan Hjelm
ibv_devinfo -v -Nathan On Jun 15, 2016, at 12:43 PM, "Sasso, John (GE Power, Non-GE)" wrote: QUESTION: Since the error said the system may have run out of queue pairs, how do I determine the # of queue pairs the IB HCA can support? -Original Message- From: users [mailto:users-boun.

Re: [OMPI users] "failed to create queue pair" problem, but settings appear OK

2016-06-16 Thread Nathan Hjelm
that an upper bound on the number of nodes would be 392632 / 24^2 ~ 681 > nodes. This does not make sense, because I saw the QP creation failure error > (again, NO error about failure to register enough memory) for as small as 177 > 24-core nodes! I don’t know how to make sense of thi

Re: [OMPI users] Error with Open MPI 2.0.0: error obtaining device attributes for mlx5_0 errno says Cannot allocate memory

2016-07-13 Thread Nathan Hjelm
As of 2.0.0 we now support experimental verbs. It looks like one of the calls is failing: #if HAVE_DECL_IBV_EXP_QUERY_DEVICE device->ib_exp_dev_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1; if(ibv_exp_query_device(device->ib_dev_context, &device->ib_exp_dev_attr)){ BTL_ERROR(

Re: [OMPI users] Forcing TCP btl

2016-07-19 Thread Nathan Hjelm
You probably will also want to run with -mca pml ob1 to make sure mxm is not in use. The combination should be sufficient to force tcp usage. -Nathan > On Jul 18, 2016, at 10:50 PM, Saliya Ekanayake wrote: > > Hi, > > I read in a previous thread > (https://www.open-mpi.org/community/lists/us

Re: [OMPI users] Problems with mpirun in openmpi-1.8.1 and -2.0.0

2016-08-23 Thread Nathan Hjelm
Might be worth trying with --mca btl_openib_cpc_include udcm   and see if that works. -Nathan On Aug 23, 2016, at 02:41 AM, "Juan A. Cordero Varelaq" wrote: Hi Gilles, If I run it like this: mpirun --mca btl ^openib,usnic --mca pml ob1 --mca btl_sm_use_knem 0 -np 5 myscript.sh it works fine

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Nathan Hjelm
There is a bug in the code that keeps the dynamic regions sorted. Should have it fixed shortly. -Nathan On Aug 25, 2016, at 07:46 AM, Christoph Niethammer wrote: Hello, The Error is not 100% reproducible for me every time but seems to disappear entirely if one excludes -mca osc ^rdma or -mc

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Nathan Hjelm
Fixed on master. The fix will be in 2.0.2 but you can apply it to 2.0.0 or 2.0.1:https://github.com/open-mpi/ompi/commit/e53de7ecbe9f034ab92c832330089cf7065181dc.patch-NathanOn Aug 25, 2016, at 07:31 AM, Joseph Schuchart wrote:Gilles,Thanks for your fast reply. I did some last minute changes to th

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-09-14 Thread Nathan Hjelm
We have a new high-speed component for RMA in 2.0.x called osc/rdma. Since the component is doing direct rdma on the target we are much more strict about the ranges. osc/pt2pt doesn't bother checking at the moment. Can you build Open MPI with --enable-debug and add -mca osc_base_verbose 100 to

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-09-14 Thread Nathan Hjelm
This error was the result of a typo which caused an incorrect range check when the compare-and-swap was on a memory region less than 8 bytes away from the end of the window. We never caught this because in general no apps create a window as small as that MPICH test (4 bytes). We are adding the

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Nathan Hjelm
FWIW it works fine for me on my MacBook Pro running 10.12 with Open MPI 2.0.1 installed through homebrew: ✗ brew -v Homebrew 1.0.0 (git revision c3105; last commit 2016-09-22) Homebrew/homebrew-core (git revision 227e; last commit 2016-09-22) ✗ brew info openmpi open-mpi: stable 2.0.1 (bottled)

Re: [OMPI users] OS X + Xcode 8 : dyld: Symbol not found: _clock_gettime

2016-10-03 Thread Nathan Hjelm
I didn't think we even used clock_gettime() on Linux in 1.10.x. A quick check of the git branch confirms that. ompi-release git:(v1.10) ✗ find . -name '*.[ch]' | xargs grep clock_gettime ompi-release git:(v1.10) ✗ -Nathan On Oct 03, 2016, at 10:50 AM, George Bosilca wrote: This function is n

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Nathan Hjelm
UDCM does not require IPoIB. It should be working for you. Can you build Open MPI with --enable-debug and run with -mca btl_base_verbose 100 and create a gist with the output. -Nathan On Nov 01, 2016, at 07:50 AM, Sergei Hrushev wrote: I haven't worked with InfiniBand for years, but I do be

Re: [OMPI users] Follow-up to Open MPI SC'16 BOF

2016-11-23 Thread Nathan Hjelm
Integration is already in the 2.x branch. The problem is the way we handle the info key is a bit of a hack. We currently pull out one info key and pass it down to the mpool as a string. Ideally we want to just pass the info object so each mpool can define its own info keys. That requires the inf

Re: [OMPI users] rdmacm and udcm failure in 2.0.1 on RoCE

2016-12-14 Thread Nathan Hjelm
Can you configure with —enable-debug and run with —mca btl_base_verbose 100 and provide the output? It may indicate why neither udcm nor rdmacm are available. -Nathan > On Dec 14, 2016, at 2:47 PM, Dave Turner wrote: > >

Re: [OMPI users] openmpi single node jobs using btl openib

2017-02-07 Thread Nathan Hjelm
That backtrace shows we are registering MPI_Alloc_mem memory with verbs. This is expected behavior but it doesn’t show the openib btl being used for any communication. I am looking into a issue on an OmniPath system where just initializing the openib btl causes performance problems even if it is

Re: [OMPI users] MPI_THREAD_MULTIPLE: Fatal error in MPI_Win_flush

2017-02-19 Thread Nathan Hjelm
You can not perform synchronization at the same time as communication on the same target. This means if one thread is in MPI_Put/MPI_Get/MPI_Accumulate (target) you can’t have another thread in MPI_Win_flush (target) or MPI_Win_flush_all(). If your program is doing that it is not a valid MPI pr

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-07 Thread Nathan Hjelm
If this is with 1.10.x or older run with --mca memory_linux_disable 1. There is a bad interaction between ptmalloc2 and psm2 support. This problem is not present in v2.0.x and newer. -Nathan > On Mar 7, 2017, at 10:30 AM, Paul Kapinos wrote: > > Hi Dave, > > >> On 03/06/17 18:09, Dave Love

Re: [OMPI users] Passive target sync. support

2017-04-03 Thread Nathan Hjelm
On Apr 03, 2017, at 08:36 AM, Sebastian Rinke wrote: Dear all, I’m using passive target sync. in my code and would like to know how well it is supported in Open MPI. In particular, the code is some sort of particle tree code that uses a distributed tree and every rank gets non-local tree no

Re: [OMPI users] Passive target sync. support

2017-04-03 Thread Nathan Hjelm
certain flags to enable the hardware put/get support? Sebastian On 03 Apr 2017, at 18:02, Nathan Hjelm wrote: On Apr 03, 2017, at 08:36 AM, Sebastian Rinke wrote: Dear all, I’m using passive target sync. in my code and would like to know how well it is supported in Open MPI. In particular

Re: [OMPI users] How to Free Memory Allocated with MPI_Win_allocate()?

2017-04-24 Thread Nathan Hjelm
You don't. The memory is freed when the window is freed by MPI_Win_free (). See MPI-3.1 § 11.2.5 -Nathan On Apr 24, 2017, at 11:41 AM, Benjamin Brock wrote: How are we meant to free memory allocated with MPI_Win_allocate()?  The following crashes for me with OpenMPI 1.10.6: #include #inclu

Re: [OMPI users] How to use MPI_Win_attach() (or how to specify the 'displ' on a remote process)

2017-05-04 Thread Nathan Hjelm
This behavior is clearly specified in the standard. From MPI 3.1 § 11.2.4:In the case of a window created with MPI_WIN_CREATE_DYNAMIC, the target_disp for all RMA functions is the address at the target; i.e., the effective window_base is MPI_BOTTOM and the disp_unit is one. For dynamic windows, the

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Nathan Hjelm
Add —mca btl self,vader -Nathan > On May 19, 2017, at 1:23 AM, Gabriele Fatigati wrote: > > Oh no, by using two procs: > > > findActiveDevices Error > We found no active IB device ports > findActiveDevices Error > We found no active IB device ports > --

Re: [OMPI users] Tuning vader for MPI_Wait Halt?

2017-06-05 Thread Nathan Hjelm
Can you provide a reproducer for the hang? What kernel version are you using? Is xpmem installed? -Nathan On Jun 05, 2017, at 10:53 AM, Matt Thompson wrote: OMPI Users, I was wondering if there is a best way to "tune" vader to get around an intermittent MPI_Wait halt?  I ask because I rece

Re: [OMPI users] Tuning vader for MPI_Wait Halt?

2017-06-07 Thread Nathan Hjelm
but my desktop does not have it. So, perhaps not XPMEM related? Matt On Mon, Jun 5, 2017 at 1:00 PM, Nathan Hjelm wrote: Can you provide a reproducer for the hang? What kernel version are you using? Is xpmem installed? -Nathan On Jun 05, 2017, at 10:53 AM, Matt Thompson wrote: OMPI Users,

Re: [OMPI users] "undefined reference to `MPI_Comm_create_group'" error message when using Open MPI 1.6.2

2017-06-08 Thread Nathan Hjelm
MPI_Comm_create_groups is an MPI-3.0+ function. 1.6.x is MPI-2.1. You can use the macros MPI_VERSION and MPI_SUBVERSION to check the MPI version. You will have to modify your code if you want it to work with older versions of Open MPI. -Nathan On Jun 08, 2017, at 03:59 AM, Arham Amouie via us

Re: [OMPI users] MPI_CANCEL for nonblocking collective communication

2017-06-09 Thread Nathan Hjelm
MPI 3.1 5.12 is pretty clear on the matter: "It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request associated with a nonblocking collective operation." -Nathan > On Jun 9, 2017, at 5:33 AM, Markus wrote: > > Dear MPI Users and Maintainers, > > I am using openMPI in version 1.

Re: [OMPI users] Remote progress in MPI_Win_flush_local

2017-06-23 Thread Nathan Hjelm
This is not the intended behavior. Please open a bug on github. -Nathan On Jun 23, 2017, at 08:21 AM, Joseph Schuchart wrote: All, We employ the following pattern to send signals between processes: ``` int com_rank, root = 0; // allocate MPI window MPI_Win win = allocate_win(); // do some co

Re: [OMPI users] --enable-builtin-atomics

2017-08-01 Thread Nathan Hjelm
So far only cons. The gcc and sync builtin atomic provide slower performance on x86-64 (and possible other platforms). I plan to investigate this as part of the investigation into requiring C11 atomics from the C compiler. -Nathan > On Aug 1, 2017, at 10:34 AM, Dave Love wrote: > > What are

Re: [OMPI users] srun and openmpi

2011-01-24 Thread Nathan Hjelm
I am seeing similar issues on our slurm clusters. We are looking into the issue. -Nathan HPC-3, LANL On Tue, 11 Jan 2011, Michael Di Domenico wrote: Any ideas on what might be causing this one? Or atleast what additional debug information someone might need? On Fri, Jan 7, 2011 at 4:03 PM, M

Re: [OMPI users] srun and openmpi

2011-01-25 Thread Nathan Hjelm
or us that only equates to one small machine, but it's still annoying. unfortunately, i don't have enough knowledge to dive into the code to help fix, but i can certainly help test On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm wrote: I am seeing similar issues on our slurm clusters. We a

Re: [OMPI users] qp memory allocation problem

2011-09-12 Thread Nathan Hjelm
-Nathan Hjelm Los Alamos National Laboratory On Mon, 12 Sep 2011, Samuel K. Gutierrez wrote: Hi, This problem can be  caused by a variety of things, but I suspect our default queue pair parameters (QP) aren't helping the situation :-). What happens when you add the following to your m

Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Nathan Hjelm
): options mlx4_core log_mtts_per_seg=X BTW, what was log_mtts_per_seg set to? -Nathan Hjelm Los Alamos National Laboratory

Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Nathan Hjelm
/reloading mlx4_core (after and dependent modules). -Nathan Hjelm Los Alamos National Laboratory

Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Nathan Hjelm
On Mon, 12 Sep 2011, Blosch, Edwin L wrote: It was set to 0 previously. We've set it to 4 and restarted some service and now it works. So both your and Samuel's suggestions worked. On another system, slightly older, it was defaulted to 3 instead of 0, and apparently that explains why the j

Re: [OMPI users] IB Memory Requirements, adjusting for reduced memory consumption

2012-01-12 Thread Nathan Hjelm
I would start by adjusting btl_openib_receive_queues . The default uses a per-peer QP which can eat up a lot of memory. I recommend using no per-peer and several shared receive queues. We use S,4096,1024:S,12288,512:S,65536,512 -Nathan On Thu, 12 Jan 2012, V. Ram wrote: Open MPI IB Gurus, I

Re: [OMPI users] compilation error with pgcc Unknown switch

2012-02-16 Thread Nathan Hjelm
Abhinav, you shouldn't be using the cray wrappers to build Open MPI or anything linked against Open MPI. The Cray wrappers will automatically include lots of stuff you don't want. Use pgcc, pgcc, or icc directly. You shouldn't have any trouble running in parallel with either aprun or mpirun (or

Re: [OMPI users] compilation error with pgcc Unknown switch

2012-02-16 Thread Nathan Hjelm
run on the compute nodes of the cray cluster (it just ran on the MOM node). Therefore I have been trying to compiler OpenMPI with the cray wrappers. I will checkout the cray-xe6 version, and try to follow the instructions. Thanks! Abhinav. On Thu, Feb 16, 2012 at 8:31 AM, Nathan Hjelm wrote

Re: [OMPI users] compilation error with pgcc Unknown switch

2012-02-28 Thread Nathan Hjelm
On Mon, 27 Feb 2012, Abhinav Sarje wrote: Hi Nathan, Gus, Manju, I got a chance to try out the XE6 support build, but with no success. First I was getting this error: "PGC-F-0010-File write error occurred (temporary pragma .s file)". After searching online about this error, I saw that there i

Re: [OMPI users] compilation error with pgcc Unknown switch

2012-03-05 Thread Nathan Hjelm
cursive] Error 1 make[1]: Leaving directory `/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi' make: *** [all-recursive] Error 1 -- Any idea why this is happening, and how to fix it? Again, I am using the XE6 platform configuration file. Abhinav. On Wed, Feb 29, 2012 at 12:1

Re: [OMPI users] compilation error with pgcc Unknown switch

2012-03-06 Thread Nathan Hjelm
ill builds fine. On Tue, Mar 6, 2012 at 5:38 AM, Jeffrey Squyres wrote: I disabled C++ inline assembly for PGI (we already had C inline assembly for PGI). So I don't think this should have caused a new error... should it? On Mar 5, 2012, at 10:21 AM, Nathan Hjelm wrote: Try pulling a f

Re: [OMPI users] CUDA RDMA not selected by default

2012-03-19 Thread Nathan Hjelm
The selection of cm is not wrong per se. You will find that the psm mtl is much better than the openib btl for QLogic harware. -Nathan On Mon, 19 Mar 2012, Jens Glaser wrote: Hello, I am using the latest trunk version of OMPI, in order to take advantage of the new CUDA RDMA features (smcuda

Re: [OMPI users] Open MPI on Cray XE6 / Gemini

2012-10-10 Thread Nathan Hjelm
On Wed, Oct 10, 2012 at 02:50:59PM +0200, Christoph Niethammer wrote: > Hello, > > I just tried to use Open MPI 1.7a1r27416 on a Cray XE6 system. Unfortunately > I > get the following error when I run a simple HelloWorldMPI program: > > $ pirun HelloWorldMPI > App launch reported: 2 (out of 2)

Re: [OMPI users] Open MPI on Cray XE6 / Gemini

2012-10-10 Thread Nathan Hjelm
;mpirun" and then it should work just > fine. > > > > On Wed, Oct 10, 2012 at 7:59 AM, Nathan Hjelm wrote: > > > On Wed, Oct 10, 2012 at 02:50:59PM +0200, Christoph Niethammer wrote: > > > Hello, > > > > > > I just tried to use Open MPI 1.7

Re: [OMPI users] OpenMPI at scale on Cray XK7

2013-04-22 Thread Nathan Hjelm
On Mon, Apr 22, 2013 at 03:17:16PM -0700, Mike Clark wrote: > Hi, > > I am trying to run OpenMPI on the Cray XK7 system at Oak Ridge National Lab > (Titan), and am running in an issue whereby MPI_Init seems to hang > indefinitely, but this issue only arises at large scale, e.g., when running >

Re: [OMPI users] OpenMPI at scale on Cray XK7

2013-04-23 Thread Nathan Hjelm
ove that (I have some ideas but nothing has been implemented yet). At 8192 nodes this takes less than a minute. Everything else should be fairly quick. -Nathan Hjelm HPC-3, LANL

Re: [OMPI users] OpenMPI at scale on Cray XK7

2013-04-23 Thread Nathan Hjelm
On Tue, Apr 23, 2013 at 10:17:46AM -0700, Ralph Castain wrote: > > On Apr 23, 2013, at 10:09 AM, Nathan Hjelm wrote: > > > On Tue, Apr 23, 2013 at 12:21:49PM +0400, > > wrote: > >> Hi, > >> > >> Nathan, could

Re: [OMPI users] OpenMPI at scale on Cray XK7

2013-04-24 Thread Nathan Hjelm
On Wed, Apr 24, 2013 at 05:01:43PM +0400, Derbunovich Andrei wrote: > Thank you to everybody for suggestions and comments. > > I have used relatively small number of nodes (4400). It looks like that > the main issue that I didn't disable dynamic components opening in my > openmpi build while kee

Re: [OMPI users] basic questions about compiling OpenMPI

2013-05-22 Thread Nathan Hjelm
If you are only using the C API there will be no issues. There are no guarantees with C++ or fortran. -Nathan Hjelm HPC-3, LANL On Wed, May 22, 2013 at 03:08:31PM +, Blosch, Edwin L wrote: > Apologies for not exploring the FAQ first. > > > > If I want to use Intel or PG

Re: [OMPI users] Problem building OpenMPI 1.6.4 with PGI 13.4

2013-05-29 Thread Nathan Hjelm
It works with PGI 12.x and it better work with newer versions since offsetof is ISOC89/ANSIC. -Nathan On Wed, May 29, 2013 at 09:31:58PM +, Jeff Squyres (jsquyres) wrote: > Edwin -- > > Can you ask PGI support about this? I swear that the PGI compiler suite has > supported offsetof before

Re: [OMPI users] Re-locate OpenMPI installation on OS X

2013-08-16 Thread Nathan Hjelm
You may also need to update where the binaries and libraries look. See the man pages for otool and install_name_tool for more information. Here is a basic example: bash-3.2# otool -L libmpi.dylib libmpi.dylib: /opt/local/lib/libmpi.1.dylib (compatibility version 3.0.0, current version 3.

Re: [OMPI users] What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-08-23 Thread Nathan Hjelm
gEnv-intel also works) module unload cray-mpich2 xt-libsci module load openmpi/1.7.2 -Nathan Hjelm Open MPI Team, HPC-3, LANL

Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Nathan Hjelm
Hmm, what CLE release is your development cluster running? It is the value after PrgEnv. Ex. on Cielito we have 4.1.40. 32) PrgEnv-gnu/4.1.40 We have not yet fully tested Open MPI on CLE 5.x.x. -Nathan Hjelm HPC-3, LANL On Tue, Sep 03, 2013 at 10:33:57PM +, Teranishi, Keita wrote: >

Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Nathan Hjelm
1.37969.2.32.gem 30) eswrap/1.0.8 > 15) rca/1.0.0-2.0401.38656.2.2.gem 31) craype-mc8 > 16) dvs/1.8.6_0.9.0-1.0401.1401.1.120 32) PrgEnv-gnu/4.1.40 > > > Thanks, > Keita > > > > On 9/3/13 3:42 PM, "Nathan Hjelm" wrote: > > >Hm

  1   2   3   >