There is no async progress in Open MPI at this time so this is the
expected behavior. We plan to fix this for the 1.9 release series.
-Nathan Hjelm
HPC-5, LANL
On Mon, Apr 07, 2014 at 11:12:06AM +0800, Zehan Cui wrote:
> Hi Matthieu,
>
> Thanks for your suggestion. I tried MPI_Waita
more info on the XE, XK, and XC support feel free to ask on
this list and I will try to get an answer back quickly.
-Nathan Hjelm
HPC-5, LANL
On Wed, Apr 16, 2014 at 05:01:37PM -0400, Ray Sheppard wrote:
>Hello,
> Big Red 2 provides its own MPICH based MPI. The only case whe
the pointer component
> points to. That address I would then like to use for MPI_Put/MPI_Get
> - without support of the remove side and, in particular, without
> calling a collective on all all processes. Any idea how to do this?
This is possible if the window was creates with
MPI_Win_crea
design and implement it.
>
>
> Please allow me to chip in my $0.02 and suggest to not reinvent the wheel,
> but instead consider to migrate the build system to cmake :
Umm, no.
IMHO, CMake has its own set of issues. So, its likely not going to happen.
-Nathan Hjelm
HPC-5, LANL
pgpV2U7xXfd2R.pgp
Description: PGP signature
On Wed, May 28, 2014 at 12:32:35AM +0200, Alain Miniussi wrote:
> Unfortunately, the debug library works like a charm (which make the
> uninitialized variable issue more likely).
>
> Still, the stack trace point to mca_btl_openib_add_procs in
> ompi/mca/btl/openib/btl_openib.c and there is only on
We are aware of the problem and many of these leaks are already fixed in
the trunk and 1.8.2 nightlies.
-Nathan Hjelm
HPC-5, LANL
On Fri, May 30, 2014 at 12:19:15PM -0700, W Spector wrote:
> Hi,
>
> I have been doing a lot of testing/fixing lately on our code, using valgrind
> to f
I have a platform file for the XC30 that I haven't yet pushed to the
repository. I will try to push it later today.
-Nathan
On Thu, Jun 05, 2014 at 04:00:03PM +, Hammond, Simon David (-EXP) wrote:
> Hi OpenMPI developers/users,
>
> Does anyone have a working configure line for OpenMPI 1.8.1
On Tue, Jun 10, 2014 at 12:10:28AM +, Jeff Squyres (jsquyres) wrote:
> I seem to recall that you have an IB-based cluster, right?
>
> From a *very quick* glance at the code, it looks like this might be a simple
> incorrect-finalization issue. That is:
>
> - you run the job on a single serve
been up?
-Nathan Hjelm
Application Readiness, HPC-5, LANL
On Tue, Jun 10, 2014 at 02:06:54PM -0400, Fischer, Greg A. wrote:
> Jeff/Nathan,
>
> I ran the following with my debug build of OpenMPI 1.8.1 - after opening a
> terminal on a compute node with "qsub -l nodes 2 -I&qu
Out of curiosity what is the mlock limit on your system? If it is too
low that can cause ibv_create_cq to fail. To check run ulimit -m.
-Nathan Hjelm
Application Readiness, HPC-5, LANL
On Tue, Jun 10, 2014 at 02:53:58PM -0400, Fischer, Greg A. wrote:
> Yes, this fails on all nodes on the sys
PM, "Fischer, Greg A."
> wrote:
>
> > Is there any other work around that I might try? Something that
> avoids UDCM?
> >
> > -Original Message-
> > From: Fischer, Greg A.
> > Sent: Tue
Can you try with a 1.8.2 nightly tarball or the trunk? I fixed a couple
of bugs that varlist discovered (also found some in varlist).
-Nathan Hjelm
HPC-5, LANL
On Fri, Jul 11, 2014 at 04:42:01PM +, Gallardo, Esthela wrote:
>Hi,
>
>I am new to the MPI_T interface, and was
The current nightly tarball can be found at
http://www.open-mpi.org/nightly/v1.8/openmpi-1.8.2a1r32209.tar.gz
-Nathan Hjelm
HPC-5, LANL
On Fri, Jul 11, 2014 at 05:04:07PM +, Gallardo, Esthela wrote:
> Hi Nathan,
>
> Where can I access the 1.8.2 tarball? I'm not sure if you me
ou meant to include it
> as an attachment. If so, then it did not go through.
>
> Thank you,
>
> Esthela Gallardo
> ____
> From: users on behalf of Nathan Hjelm
>
> Sent: Friday, July 11, 2014 10:50 AM
> To: Open MPI Users
&g
Ignore that. Their version is ok. The one I have looks like it is out of
date. Just tested theirs with trunk.
-Nathan
On Fri, Jul 11, 2014 at 11:27:42AM -0600, Nathan Hjelm wrote:
>
> Hmm, looks like the varlist fixes I provided to LLNL haven't made it
> into their git repo. Us
It likely won't build because last I check the Microsoft toolchain does
not fit the minimum requirements (C99 or higher). You will have better
luck with either gcc or intel's compiler.
-Nathan
On Wed, Jul 16, 2014 at 04:52:53PM +0100, MM wrote:
> hello,
> I'm about to try to build 1.8.1 with win
-np 16 -hostfile hosts --mca btl openib,self ./varlist
>
> Is this correct?
>
> Thank you,
>
> Esthela Gallardo
> ________
> From: users on behalf of Nathan Hjelm
>
> Sent: Friday, July 11, 2014 11:33 AM
> To: Open MPI Us
Can you try adding the
#include
to pml_ob1_isend.c
And see if that resolves the issue?
-Nathan
On Fri, Jul 25, 2014 at 07:59:21AM +0200, Siegmar Gross wrote:
> Hi,
>
> today I tried to track down the error which I reported for
> my small program (running on Solaris 10 Sparc).
>
> tyr hello
And it doesn't support knem at this time. Probably never will because of
the existence of CMA.
-Nathan
On Thu, Oct 16, 2014 at 01:49:09PM -0700, Ralph Castain wrote:
> FWIW: vader is the default in 1.8
>
> On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller wrote:
>
> > Are you sure you are not u
On Thu, Oct 16, 2014 at 05:27:54PM -0400, Gus Correa wrote:
> Thank you, Aurelien!
>
> Aha, "vader btl", that is new to me!
> I tought Vader was that man dressed in black in Star Wars,
> Obi-Wan Kenobi's nemesis.
> That was a while ago, my kids were children,
> and Alec Guiness younger than Harris
would suggest the stack trace analysis
tool (STAT). I might help you narrow down where the problem is
occuring.
-Nathan Hjelm
HPC-5, LANL
On Tue, Oct 21, 2014 at 01:12:21PM +1100, Marshall Ward wrote:
> Thanks, it's at least good to know that the behaviour isn't normal!
>
> Co
On Mon, Oct 27, 2014 at 02:15:45PM +, michael.rach...@dlr.de wrote:
> Dear Gilles,
>
> This is the system response on the login node of cluster5:
>
> cluster5:~/dat> mpirun -np 1 df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda31 228G 5.6G 211G 3% /
> udev
emory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy/
-Nathan Hjelm
HPC-5, LANL
On Fri, Oct 17, 2014 at 01:02:23PM -0700, Ralph Castain wrote:
> On Oct 17, 2014, at 12:06 PM, Gus Correa wrote:
> Hi Jeff
>
> Many thanks for looking into this and filing a bug
You could just disable leave pinned:
-mca mpi_leave_pinned 0 -mca mpi_leave_pinned_pipeline 0
This will fix the issue but may reduce performance. Not sure why the
munmap wrapper is failing to execute but this will get you running.
-Nathan Hjelm
HPC-5, LANL
On Wed, Nov 12, 2014 at 05:08:06PM
One thing that changed between 1.6 and 1.8 is the default binding
policy. Open MPI 1.6 did not bind by default but 1.8 binds to core. You
can unset the binding policy by adding --bind-to none.
-Nathan Hjelm
HPC-5, LANL
On Tue, Dec 09, 2014 at 12:14:32PM -0500, Eric Chamberland wrote:
>
yield when idle is broken on 1.8. Fixing now.
-Nathan
On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote:
> Hmmm….well, it looks like we are doing the right thing and running unbound
> when oversubscribed like this. I don’t have any brilliant idea why it would
> be running so slowly
Several things:
- In 1.8.x only shared memory windows work with multiple threads. This
problem will be fixed in the master branch soon. A back-port to 1.8 is
unlikely given the magnitude of the changes.
- I highly recommend using the MPI-3 call MPI_Win_allocate over
MPI_Win_create. Th
Have you turned on valgrind support in Open MPI. That is required to
quite these bogus warnings.
-Nathan
On Wed, Jan 14, 2015 at 10:17:50AM +, Victor Vysotskiy wrote:
> Hi,
>
> Our parallel applications behaves strange when it is compiled with Openmpi
> v1.8.4 on both Linux and Mac OS X p
There was a bug in the MPI_MODE_NOCHECK path in osc/sm. It has been
fixed on master and a fix has been CMRed to 1.8. Thank you for reporting
this.
In the meantime you can remove MPI_MODE_NOCHECK and it should work
fine.
-Nathan
On Thu, Feb 12, 2015 at 11:10:59PM +0100, Thibaud Kloczko wrote:
>
I recommend using vader for CMA. It has code to get around the ptrace
setting. Run with mca_btl_vader_single_copy_mechanism cma (should be the
default).
-Nathan
On Wed, Feb 18, 2015 at 02:56:01PM -0500, Eric Chamberland wrote:
> Hi,
>
> I have configured with "--with-cma" on 2 differents OS (Re
>
> On both RedHat 6.5 and OpenSuse 12.3 and still get the same error message!!!
> :-/
>
> Sorry, I am not a kernel expert...
>
> What's wrong?
>
> Thanks,
>
> Eric
>
> On 02/18/2015 04:48 PM, Éric Chamberland wrote:
> >
> >Le 2015-02-18 15
On Thu, Feb 19, 2015 at 12:16:49PM -0500, Eric Chamberland wrote:
>
> On 02/19/2015 11:56 AM, Nathan Hjelm wrote:
> >
> >If you have yama installed you can try:
>
> Nope, I do not have it installed... is it absolutely necessary? (and would
> it change something w
, Eric Chamberland wrote:
> On 02/19/2015 02:58 PM, Nathan Hjelm wrote:
> >On Thu, Feb 19, 2015 at 12:16:49PM -0500, Eric Chamberland wrote:
> >>
> >>On 02/19/2015 11:56 AM, Nathan Hjelm wrote:
> >>>
> >>>If you have yama installed you can try:
el: +1 (865) 974-9375 fax: +1 (865) 974-8296
> https://icl.cs.utk.edu/~bouteill/
>
>
>
>
> > Le 19 févr. 2015 à 15:53, Nathan Hjelm a écrit :
> >
> >
> > Great! I will add an MCA variable to force CMA and also enable it if 1)
> > no yama and
Hmm, wait. Yes. Your change went in after 1.8.4 and has the same
effect. If yama ins't installed it is safe to assume that the ptrace
scope is effectively 0. So, your patch does fix the issue.
-Nathan
On Thu, Feb 19, 2015 at 02:53:47PM -0700, Nathan Hjelm wrote:
>
> I don't thi
Aurélien, I should also point out your fix has already been applied to
the 1.8 branch and will be included in 1.8.5.
-Nathan
On Thu, Feb 19, 2015 at 02:57:38PM -0700, Nathan Hjelm wrote:
>
> Hmm, wait. Yes. Your change went in after 1.8.4 and has the same
> effect. If yama ins'
Josh, do you see a hang when using vader? It is preferred over the old
sm btl.
-Nathan
On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote:
>Sachin,
>
>I am able to reproduce something funny. Looks like your issue. When I run
>on a single host with two ranks, the test works
Eric Chamberland wrote:
> Maybe it is a stupid question, but... why it is not tested and enabled by
> default at configure time since it is part of the kernel?
>
> Eric
>
>
> On 02/19/2015 03:53 PM, Nathan Hjelm wrote:
> >Great! I will add an MCA variable to force CMA and also
What program are you using for the benchmark? Are you using the xpmem
branch in my github? For my testing I used a stock ubuntu 3.13 kernel
but I have not full stress-tested my xpmem branch.
I will see if I can reproduce and fix the hang.
-Nathan
On Mon, Mar 16, 2015 at 05:32:26PM +0100, Tobias
9.
>openmpi and pw was build with the intel compilers, xpmem with gcc.
>
>Kind regards,
>Tobias
>
>On 03/16/2015 05:56 PM, Nathan Hjelm wrote:
>
> What program are you using for the benchmark? Are you using the xpmem
> branch in my github? For my test
t;Kind regards,
>Tobias
>
>On 03/16/2015 05:56 PM, Nathan Hjelm wrote:
>
> What program are you using for the benchmark? Are you using the xpmem
> branch in my github? For my testing I used a stock ubuntu 3.13 kernel
> but I have not full stress-tested my xpm
benefit to using per-peer queue pairs and they do not
scale.
-Nathan Hjelm
HPC-ENV, LANL
On Mon, May 16, 2016 at 12:21:41PM -0400, Xiaolong Cui wrote:
>Hi,
>I am using Open MPI 1.8.6. I guess my question is related to the flow
>control algorithm for small messages. The question
credits?
>Best,
>Michael
> On Mon, May 16, 2016 at 6:35 PM, Nathan Hjelm wrote:
>
> When using eager_rdma the sender will block once it runs out of
> "credits". If the receiver enters MPI for any reason the incoming
> messages will be p
s gone. But
>removing the per-peer queue pair does not help.
>Do you know any document that discusses the open mpi internals, especially
>related to this problem?
>On Tue, May 17, 2016 at 11:00 AM, Nathan Hjelm wrote:
>
> If it is blocking on the first message th
You use the *_base_verbose MCA variables. For example, if you want to see
output from the btl use -mca btl_base_verbose x. The number x controls the
verbosity level. Starting with 2.x are named levels but now many components
conform to the names yet. In general components use use numbers between
That message is coming from udcm in the openib btl. It indicates some sort of
failure in the connection mechanism. It can happen if the listening thread no
longer exists or is taking too long to process messages.
-Nathan
On Jun 14, 2016, at 12:20 PM, Ralph Castain wrote:
Hmm…I’m unable to r
You ran out of queue pairs. There is no way around this for larger all-to-all
transfers when using the openib btl and SRQ. Need O(cores^2) QPs to fully
connect with SRQ or PP QPs. I recommend using XRC instead by adding:
btl_openib_receive_queues = X,4096,1024:X,12288,512:X,65536,512
to your o
ibv_devinfo -v
-Nathan
On Jun 15, 2016, at 12:43 PM, "Sasso, John (GE Power, Non-GE)"
wrote:
QUESTION: Since the error said the system may have run out of queue pairs, how
do I determine the # of queue pairs the IB HCA can support?
-Original Message-
From: users [mailto:users-boun.
that an upper bound on the number of nodes would be 392632 / 24^2 ~ 681
> nodes. This does not make sense, because I saw the QP creation failure error
> (again, NO error about failure to register enough memory) for as small as 177
> 24-core nodes! I don’t know how to make sense of thi
As of 2.0.0 we now support experimental verbs. It looks like one of the calls
is failing:
#if HAVE_DECL_IBV_EXP_QUERY_DEVICE
device->ib_exp_dev_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1;
if(ibv_exp_query_device(device->ib_dev_context, &device->ib_exp_dev_attr)){
BTL_ERROR(
You probably will also want to run with -mca pml ob1 to make sure mxm is not in
use. The combination should be sufficient to force tcp usage.
-Nathan
> On Jul 18, 2016, at 10:50 PM, Saliya Ekanayake wrote:
>
> Hi,
>
> I read in a previous thread
> (https://www.open-mpi.org/community/lists/us
Might be worth trying with --mca btl_openib_cpc_include udcm and see if that
works.
-Nathan
On Aug 23, 2016, at 02:41 AM, "Juan A. Cordero Varelaq"
wrote:
Hi Gilles,
If I run it like this:
mpirun --mca btl ^openib,usnic --mca pml ob1 --mca btl_sm_use_knem 0 -np 5
myscript.sh
it works fine
There is a bug in the code that keeps the dynamic regions sorted. Should have
it fixed shortly.
-Nathan
On Aug 25, 2016, at 07:46 AM, Christoph Niethammer wrote:
Hello,
The Error is not 100% reproducible for me every time but seems to disappear
entirely if one excludes
-mca osc ^rdma
or
-mc
Fixed on master. The fix will be in 2.0.2 but you can apply it to 2.0.0 or 2.0.1:https://github.com/open-mpi/ompi/commit/e53de7ecbe9f034ab92c832330089cf7065181dc.patch-NathanOn Aug 25, 2016, at 07:31 AM, Joseph Schuchart wrote:Gilles,Thanks for your fast reply. I did some last minute changes to th
We have a new high-speed component for RMA in 2.0.x called osc/rdma. Since the
component is doing direct rdma on the target we are much more strict about the
ranges. osc/pt2pt doesn't bother checking at the moment.
Can you build Open MPI with --enable-debug and add -mca osc_base_verbose 100 to
This error was the result of a typo which caused an incorrect range check when
the compare-and-swap was on a memory region less than 8 bytes away from the end
of the window. We never caught this because in general no apps create a window
as small as that MPICH test (4 bytes). We are adding the
FWIW it works fine for me on my MacBook Pro running 10.12 with Open MPI 2.0.1
installed through homebrew:
✗ brew -v
Homebrew 1.0.0 (git revision c3105; last commit 2016-09-22)
Homebrew/homebrew-core (git revision 227e; last commit 2016-09-22)
✗ brew info openmpi
open-mpi: stable 2.0.1 (bottled)
I didn't think we even used clock_gettime() on Linux in 1.10.x. A quick check
of the git branch confirms that.
ompi-release git:(v1.10) ✗ find . -name '*.[ch]' | xargs grep clock_gettime
ompi-release git:(v1.10) ✗
-Nathan
On Oct 03, 2016, at 10:50 AM, George Bosilca wrote:
This function is n
UDCM does not require IPoIB. It should be working for you. Can you build Open
MPI with --enable-debug and run with -mca btl_base_verbose 100 and create a
gist with the output.
-Nathan
On Nov 01, 2016, at 07:50 AM, Sergei Hrushev wrote:
I haven't worked with InfiniBand for years, but I do be
Integration is already in the 2.x branch. The problem is the way we handle the
info key is a bit of a hack. We currently pull out one info key and pass it
down to the mpool as a string. Ideally we want to just pass the info object so
each mpool can define its own info keys. That requires the inf
Can you configure with —enable-debug and run with —mca btl_base_verbose 100 and
provide the output? It may indicate why neither udcm nor rdmacm are available.
-Nathan
> On Dec 14, 2016, at 2:47 PM, Dave Turner wrote:
>
>
That backtrace shows we are registering MPI_Alloc_mem memory with verbs. This
is expected behavior but it doesn’t show the openib btl being used for any
communication. I am looking into a issue on an OmniPath system where just
initializing the openib btl causes performance problems even if it is
You can not perform synchronization at the same time as communication on the
same target. This means if one thread is in MPI_Put/MPI_Get/MPI_Accumulate
(target) you can’t have another thread in MPI_Win_flush (target) or
MPI_Win_flush_all(). If your program is doing that it is not a valid MPI
pr
If this is with 1.10.x or older run with --mca memory_linux_disable 1. There is
a bad interaction between ptmalloc2 and psm2 support. This problem is not
present in v2.0.x and newer.
-Nathan
> On Mar 7, 2017, at 10:30 AM, Paul Kapinos wrote:
>
> Hi Dave,
>
>
>> On 03/06/17 18:09, Dave Love
On Apr 03, 2017, at 08:36 AM, Sebastian Rinke wrote:
Dear all,
I’m using passive target sync. in my code and would like to
know how well it is supported in Open MPI.
In particular, the code is some sort of particle tree code that uses a
distributed tree and every rank
gets non-local tree no
certain flags to enable the
hardware put/get support?
Sebastian
On 03 Apr 2017, at 18:02, Nathan Hjelm wrote:
On Apr 03, 2017, at 08:36 AM, Sebastian Rinke wrote:
Dear all,
I’m using passive target sync. in my code and would like to
know how well it is supported in Open MPI.
In particular
You don't. The memory is freed when the window is freed by MPI_Win_free (). See
MPI-3.1 § 11.2.5
-Nathan
On Apr 24, 2017, at 11:41 AM, Benjamin Brock wrote:
How are we meant to free memory allocated with MPI_Win_allocate()? The
following crashes for me with OpenMPI 1.10.6:
#include
#inclu
This behavior is clearly specified in the standard. From MPI 3.1 § 11.2.4:In the case of a window created with MPI_WIN_CREATE_DYNAMIC, the target_disp for all RMA functions is the address at the target; i.e., the effective window_base is MPI_BOTTOM and the disp_unit is one. For dynamic windows, the
Add —mca btl self,vader
-Nathan
> On May 19, 2017, at 1:23 AM, Gabriele Fatigati wrote:
>
> Oh no, by using two procs:
>
>
> findActiveDevices Error
> We found no active IB device ports
> findActiveDevices Error
> We found no active IB device ports
> --
Can you provide a reproducer for the hang? What kernel version are you using?
Is xpmem installed?
-Nathan
On Jun 05, 2017, at 10:53 AM, Matt Thompson wrote:
OMPI Users,
I was wondering if there is a best way to "tune" vader to get around an
intermittent MPI_Wait halt?
I ask because I rece
but my desktop does not have it. So,
perhaps not XPMEM related?
Matt
On Mon, Jun 5, 2017 at 1:00 PM, Nathan Hjelm wrote:
Can you provide a reproducer for the hang? What kernel version are you using?
Is xpmem installed?
-Nathan
On Jun 05, 2017, at 10:53 AM, Matt Thompson wrote:
OMPI Users,
MPI_Comm_create_groups is an MPI-3.0+ function. 1.6.x is MPI-2.1. You can use
the macros MPI_VERSION and MPI_SUBVERSION to check the MPI version.
You will have to modify your code if you want it to work with older versions of
Open MPI.
-Nathan
On Jun 08, 2017, at 03:59 AM, Arham Amouie via us
MPI 3.1 5.12 is pretty clear on the matter:
"It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request
associated with a nonblocking collective operation."
-Nathan
> On Jun 9, 2017, at 5:33 AM, Markus wrote:
>
> Dear MPI Users and Maintainers,
>
> I am using openMPI in version 1.
This is not the intended behavior. Please open a bug on github.
-Nathan
On Jun 23, 2017, at 08:21 AM, Joseph Schuchart wrote:
All,
We employ the following pattern to send signals between processes:
```
int com_rank, root = 0;
// allocate MPI window
MPI_Win win = allocate_win();
// do some co
So far only cons. The gcc and sync builtin atomic provide slower performance on
x86-64 (and possible other platforms). I plan to investigate this as part of
the investigation into requiring C11 atomics from the C compiler.
-Nathan
> On Aug 1, 2017, at 10:34 AM, Dave Love wrote:
>
> What are
I am seeing similar issues on our slurm clusters. We are looking into the issue.
-Nathan
HPC-3, LANL
On Tue, 11 Jan 2011, Michael Di Domenico wrote:
Any ideas on what might be causing this one? Or atleast what
additional debug information someone might need?
On Fri, Jan 7, 2011 at 4:03 PM, M
or us that only equates to one small
machine, but it's still annoying. unfortunately, i don't have enough
knowledge to dive into the code to help fix, but i can certainly help
test
On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm wrote:
I am seeing similar issues on our slurm clusters. We a
-Nathan Hjelm
Los Alamos National Laboratory
On Mon, 12 Sep 2011, Samuel K. Gutierrez wrote:
Hi,
This problem can be caused by a variety of things, but I suspect our default
queue pair parameters (QP) aren't helping the
situation :-).
What happens when you add the following to your m
):
options mlx4_core log_mtts_per_seg=X
BTW, what was log_mtts_per_seg set to?
-Nathan Hjelm
Los Alamos National Laboratory
/reloading mlx4_core (after and dependent modules).
-Nathan Hjelm
Los Alamos National Laboratory
On Mon, 12 Sep 2011, Blosch, Edwin L wrote:
It was set to 0 previously. We've set it to 4 and restarted some service and
now it works. So both your and Samuel's suggestions worked.
On another system, slightly older, it was defaulted to 3 instead of 0, and
apparently that explains why the j
I would start by adjusting btl_openib_receive_queues . The default uses a
per-peer QP which can eat up a lot of memory. I recommend using no per-peer and
several shared receive queues. We use S,4096,1024:S,12288,512:S,65536,512
-Nathan
On Thu, 12 Jan 2012, V. Ram wrote:
Open MPI IB Gurus,
I
Abhinav, you shouldn't be using the cray wrappers to build Open MPI or anything
linked against Open MPI. The Cray wrappers will automatically include lots of
stuff you don't want. Use pgcc, pgcc, or icc directly. You shouldn't have any
trouble running in parallel with either aprun or mpirun (or
run on
the compute nodes of the cray cluster (it just ran on the MOM node).
Therefore I have been trying to compiler OpenMPI with the cray
wrappers.
I will checkout the cray-xe6 version, and try to follow the instructions.
Thanks!
Abhinav.
On Thu, Feb 16, 2012 at 8:31 AM, Nathan Hjelm wrote
On Mon, 27 Feb 2012, Abhinav Sarje wrote:
Hi Nathan, Gus, Manju,
I got a chance to try out the XE6 support build, but with no success.
First I was getting this error: "PGC-F-0010-File write error occurred
(temporary pragma .s file)". After searching online about this error,
I saw that there i
cursive] Error 1
make[1]: Leaving directory
`/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi'
make: *** [all-recursive] Error 1
--
Any idea why this is happening, and how to fix it? Again, I am using
the XE6 platform configuration file.
Abhinav.
On Wed, Feb 29, 2012 at 12:1
ill builds fine.
On Tue, Mar 6, 2012 at 5:38 AM, Jeffrey Squyres wrote:
I disabled C++ inline assembly for PGI (we already had C inline assembly for
PGI).
So I don't think this should have caused a new error... should it?
On Mar 5, 2012, at 10:21 AM, Nathan Hjelm wrote:
Try pulling a f
The selection of cm is not wrong per se. You will find that the psm mtl is much
better than the openib btl for QLogic harware.
-Nathan
On Mon, 19 Mar 2012, Jens Glaser wrote:
Hello,
I am using the latest trunk version of OMPI, in order to take advantage of the
new CUDA RDMA features (smcuda
On Wed, Oct 10, 2012 at 02:50:59PM +0200, Christoph Niethammer wrote:
> Hello,
>
> I just tried to use Open MPI 1.7a1r27416 on a Cray XE6 system. Unfortunately
> I
> get the following error when I run a simple HelloWorldMPI program:
>
> $ pirun HelloWorldMPI
> App launch reported: 2 (out of 2)
;mpirun" and then it should work just
> fine.
>
>
>
> On Wed, Oct 10, 2012 at 7:59 AM, Nathan Hjelm wrote:
>
> > On Wed, Oct 10, 2012 at 02:50:59PM +0200, Christoph Niethammer wrote:
> > > Hello,
> > >
> > > I just tried to use Open MPI 1.7
On Mon, Apr 22, 2013 at 03:17:16PM -0700, Mike Clark wrote:
> Hi,
>
> I am trying to run OpenMPI on the Cray XK7 system at Oak Ridge National Lab
> (Titan), and am running in an issue whereby MPI_Init seems to hang
> indefinitely, but this issue only arises at large scale, e.g., when running
>
ove that
(I have some ideas but nothing has been implemented yet). At 8192 nodes this
takes less than a minute. Everything else should be fairly quick.
-Nathan Hjelm
HPC-3, LANL
On Tue, Apr 23, 2013 at 10:17:46AM -0700, Ralph Castain wrote:
>
> On Apr 23, 2013, at 10:09 AM, Nathan Hjelm wrote:
>
> > On Tue, Apr 23, 2013 at 12:21:49PM +0400,
> > wrote:
> >> Hi,
> >>
> >> Nathan, could
On Wed, Apr 24, 2013 at 05:01:43PM +0400, Derbunovich Andrei wrote:
> Thank you to everybody for suggestions and comments.
>
> I have used relatively small number of nodes (4400). It looks like that
> the main issue that I didn't disable dynamic components opening in my
> openmpi build while kee
If you are only using the C API there will be no issues. There are no
guarantees with C++ or fortran.
-Nathan Hjelm
HPC-3, LANL
On Wed, May 22, 2013 at 03:08:31PM +, Blosch, Edwin L wrote:
> Apologies for not exploring the FAQ first.
>
>
>
> If I want to use Intel or PG
It works with PGI 12.x and it better work with newer versions since offsetof is
ISOC89/ANSIC.
-Nathan
On Wed, May 29, 2013 at 09:31:58PM +, Jeff Squyres (jsquyres) wrote:
> Edwin --
>
> Can you ask PGI support about this? I swear that the PGI compiler suite has
> supported offsetof before
You may also need to update where the binaries and libraries look. See
the man pages for otool and install_name_tool for more information. Here
is a basic example:
bash-3.2# otool -L libmpi.dylib
libmpi.dylib:
/opt/local/lib/libmpi.1.dylib (compatibility version 3.0.0, current
version 3.
gEnv-intel also works)
module unload cray-mpich2 xt-libsci
module load openmpi/1.7.2
-Nathan Hjelm
Open MPI Team, HPC-3, LANL
Hmm, what CLE release is your development cluster running? It is the value
after PrgEnv. Ex. on Cielito we have 4.1.40.
32) PrgEnv-gnu/4.1.40
We have not yet fully tested Open MPI on CLE 5.x.x.
-Nathan Hjelm
HPC-3, LANL
On Tue, Sep 03, 2013 at 10:33:57PM +, Teranishi, Keita wrote:
>
1.37969.2.32.gem 30) eswrap/1.0.8
> 15) rca/1.0.0-2.0401.38656.2.2.gem 31) craype-mc8
> 16) dvs/1.8.6_0.9.0-1.0401.1401.1.120 32) PrgEnv-gnu/4.1.40
>
>
> Thanks,
> Keita
>
>
>
> On 9/3/13 3:42 PM, "Nathan Hjelm" wrote:
>
> >Hm
1 - 100 of 244 matches
Mail list logo