All,
With some help, I managed to build an Open MPI 4.0.0 with:
./configure --disable-wrapper-rpath --disable-wrapper-runpath --with-psm2
--with-slurm --enable-mpi1-compatibility --with-ucx
--with-pmix=/usr/nlocal/pmix/2.1 --with-libevent=/usr CC=icc CXX=icpc
FC=ifort
The MPI 1 is because I need
On Jan 18, 2019, at 12:43 PM, Matt Thompson wrote:
>
> With some help, I managed to build an Open MPI 4.0.0 with:
We can discuss each of these params to let you know what they are.
> ./configure --disable-wrapper-rpath --disable-wrapper-runpath
Did you have a reason for disabling these? They'
On Fri, Jan 18, 2019 at 1:13 PM Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:
> On Jan 18, 2019, at 12:43 PM, Matt Thompson wrote:
> >
> > With some help, I managed to build an Open MPI 4.0.0 with:
>
> We can discuss each of these params to let you know what they are.
>
> >
Hi Matt,
Few comments/questions:
- If your cluster has Omni-Path, you won’t need UCX. Instead you can
run using PSM2, or alternatively OFI (a.k.a. Libfabric)
- With the command you shared below (4 ranks on the local node) (I
think) a shared mem transport is being selected (va
i compilied pmix slurm openmpi
---pmix
./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13
--disable-debug
---slurm
./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13
--with-pmix=/hpc/pmix/2.2
---openmpi
./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external
--wit
Looks strange. I’m pretty sure Mellanox didn’t implement the event notification
system in the Slurm plugin, but you should only be trying to call it if OMPI is
registering a system-level event code - which OMPI 3.1 definitely doesn’t do.
If you are using PMIx v2.2.0, then please note that there
here's the branches i'm using. i did a git clone on the repo's and
then a git checkout
[ec2-user@labhead bin]$ cd /hpc/src/pmix/
[ec2-user@labhead pmix]$ git branch
master
* v2.2
[ec2-user@labhead pmix]$ cd ../slurm/
[ec2-user@labhead slurm]$ git branch
* (detached from origin/slurm-18.08)
ma
Aha - I found it. It’s a typo in the v2.2.1 release. Sadly, our Slurm plugin
folks seem to be off somewhere for awhile and haven’t been testing it. Sigh.
I’ll patch the branch and let you know - we’d appreciate the feedback.
Ralph
> On Jan 18, 2019, at 2:09 PM, Michael Di Domenico
> wrote:
>
I have pushed a fix to the v2.2 branch - could you please confirm it?
> On Jan 18, 2019, at 2:23 PM, Ralph H Castain wrote:
>
> Aha - I found it. It’s a typo in the v2.2.1 release. Sadly, our Slurm plugin
> folks seem to be off somewhere for awhile and haven’t been testing it. Sigh.
>
> I’ll
seems to be better now. jobs are running
On Fri, Jan 18, 2019 at 6:17 PM Ralph H Castain wrote:
>
> I have pushed a fix to the v2.2 branch - could you please confirm it?
>
>
> > On Jan 18, 2019, at 2:23 PM, Ralph H Castain wrote:
> >
> > Aha - I found it. It’s a typo in the v2.2.1 release. Sadl
Good - thanks!
> On Jan 18, 2019, at 3:25 PM, Michael Di Domenico
> wrote:
>
> seems to be better now. jobs are running
>
> On Fri, Jan 18, 2019 at 6:17 PM Ralph H Castain wrote:
>>
>> I have pushed a fix to the v2.2 branch - could you please confirm it?
>>
>>
>>> On Jan 18, 2019, at 2:23
Greetings everyone,
I have a scientific code using Open MPI (v3.1.3) that seems to work fine when
MPI_Bcast() and MPI_Reduce() calls are well spaced out in time. Yet if the
time between these calls is short, eventually one of the nodes hangs at some
random point, never returning from the broad
Since neither bcast nor reduce acts as a barrier it is possible to run out of
resources if either of these calls (or both) are used in a tight loop. The sync
coll component exists for this scenario. You can enable it by adding the
following to mpirun (or setting these variables through the env
Hi,
Thanks for the quick response. But it looks like I am missing something
because neither -mca nor --mca is being recognized by my mpirun command.
% mpirun --mca coll_sync_priority 100 --mca coll_sync_barrier_after 10 -q -np 2
a.out
-
Jeff,
that could be a copy/paste error and/or an email client issue.
The syntax is
mpirun --mca variable value ...
(short hyphen, short hyphen, m, c, a)
The error message is about the missing —-mca executable
(long hyphen, short hyphen, m, c, a)
This is most likely the root cause of this issue
15 matches
Mail list logo