I see, thanks
Is there any plan to apply any optimizations on the Neighbor collectives at
some point?
regards
Michael
On Wed, Jun 8, 2022 at 1:29 PM George Bosilca wrote:
> Michael,
>
> As far as I know none of the implementations of the
> neighborhood collectives in OMPI are
ot; way to provide
optimized neighborhood collectives?
Thanks you much
Michael
perhaps there is
> different initialization that happens such that the offending device search
> problem doesn't occur?
>
>
> Thanks,
>
> David
>
>
>
>
> From: Shrader, David Lee
> Sent: Tuesday, November 2, 2021 2:09 P
fairly frequently, but not everytime when trying to run xhpl on a new
machine i'm bumping into this. it happens with a single node or
multiple nodes
node1 selected pml ob1, but peer on node1 selected pml ucx
if i rerun the exact same command a few minutes later, it works fine.
the machine is new
with each other?
Another idea that came to mind was to get an OpenMPI build that would not have
any high performance fabric support and would only work via TCP. So any advice
on how to accomplish my goal would be appreciated.
I realize that performance-wise that is going to be quite... sad. But
Wednesday, May 19, 2021 11:31 AM
To: Open MPI Users
Cc: Heinz, Michael William
Subject: Re: [OMPI users] unable to launch a job on a system with OmniPath
Just some more data from my OminPath based cluster.
There certainly was a change from 4.0.x to 4.1.x
With 4.0.1 I woud build openmpi with
.
do it. However, note that the format of the string must be 16
hex digits, a hyphen, then 16 more hex digits. anything else will be rejected.
Also, I have never tried doing this, YMMV.
From: Heinz, Michael William
Sent: Wednesday, May 19, 2021 10:35 AM
To: Open MPI Users
Cc: Ralph Castain
Subj
nta Fe, ARGENTINA.
Tel +54-342-4511594/95 ext 7062, fax: +54-342-4511169
What am I missing and how can I improve the performance?
Regards, Pavel Mezentsev.
On Mon, May 10, 2021 at 6:20 PM Heinz, Michael William <
michael.william.he...@cornelisnetworks.com<mailto:michael.william.he.
That warning is an annoying bit of cruft from the openib / verbs provider that
can be ignored. (Actually, I recommend using "-btl ^openib" to suppress the
warning.)
That said, there is a known issue with selecting PSM2 and OMPI 4.1.0. I'm not
sure that that's the problem you're hitting, though,
ds...
By the way, have you looked at using Easybuild? Would be good to have your
input there maybe.
On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users
mailto:users@lists.open-mpi.org>> wrote:
I’m having a heck of a time building OMPI with Intel C. Compilation goes fine,
ins
Giles,
I’ll double check - but the intel runtime is installed on all machines in the
fabric.
-
Michael Heinz
michael.william.he...@cornelisnetworks.com<mailto:michael.william.he...@cornelisnetworks.com>
On Apr 7, 2021, at 2:42 AM, Gilles Gouaillardet via users
mailto:users@list
rs_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
(0x7fdaa23e1000)
/lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000)
Can anyone suggest what I'm forgetting to do?
---
Michael Heinz
Fabric Software Engineer, Cornelis Networks
/intel/oneapi/compiler/2021.2.0/linux/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/10
Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/10
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
Regards,
Michael!
> bend linux4ms.net
It looks like you're trying to build Open MPI with the Intel C compiler. TBH -
I think that icc isn't included with the latest release of oneAPI, I think
they've switched to including clang instead. I had a similar issue to yours but
I resolved it by installing a 2020 version of the Intel HPC so
On Mon, Mar 22, 2021 at 11:13 AM Pritchard Jr., Howard wrote:
> https://github.com/Sandia-OpenSHMEM/SOS
> if you want to use OpenSHMEM over OPA.
> If you have lots of cycles for development work, you could write an OFI SPML
> for the OSHMEM component of Open MPI.
thanks, i am aware of the sandi
i can build and run openmpi on an opa network just fine, but it turns
out building openshmem fails. the message is (no spml) found
looking at the config log it looks like it tries to build spml ikrit
and ucx which fail. i turn ucx off because it doesn't support opa and
isn't needed.
so this mes
I’ve begun getting this annoyingly generic warning, too. It appears to be
coming from the openib provider. If you disable it with -mtl ^openib the
warning goes away.
Sent from my iPad
> On Mar 13, 2021, at 3:28 PM, Bob Beattie via users
> wrote:
>
> Hi everyone,
>
> To be honest, as an MPI
What interconnect are you using at run time? That is, are you using Ethernet or
InfiniBand or Omnipath?
Sent from my iPad
On Mar 4, 2021, at 5:05 AM, Raut, S Biplab via users
wrote:
[AMD Official Use Only - Internal Distribution Only]
After downloading a particular openMPI version, let’s
this might be happening? I do not see this with
OMPI 4.0.3.
---
Michael Heinz
Fabric Software Engineer, Cornelis Networks
Patrick,
A few more questions for you:
1. What version of IFS are you running?
2. Are you using CUDA cards by any chance? If so, what version of CUDA?
-Original Message-
From: Heinz, Michael William
Sent: Wednesday, January 27, 2021 3:45 PM
To: Open MPI Users
Subject: RE: [OMPI users
Patrick,
Do you have any PSM2_* or HFI_* environment variables defined in your run time
environment that could be affecting things?
-Original Message-
From: users On Behalf Of Heinz, Michael
William via users
Sent: Wednesday, January 27, 2021 3:37 PM
To: Open MPI Users
Cc: Heinz
Unfortunately, OPA/PSM support for Debian isn't handled by Intel directly or by
Cornelis Networks - but I should point out you can download the latest official
source for PSM2 and the drivers from Github.
-Original Message-
From: users On Behalf Of Michael Di Domenico
via users
tible with PSM and OPA when running specifically on
debian (likely due to library versioning). i don't know how common
that is, so it's not clear how flushed out and tested it is
On Wed, Jan 27, 2021 at 3:07 PM Patrick Begou via users
wrote:
>
> Hi Howard and Michael
>
> first man
2021 at 3:44 PM Patrick Begou via users
wrote:
>
> Hi Michael
>
> indeed I'm a little bit lost with all these parameters in OpenMPI, mainly
> because for years it works just fine out of the box in all my deployments on
> various architectures, interconnects and linux flavor. S
Patrick how are you using original PSM if you’re using Omni-Path hardware? The
original PSM was written for QLogic DDR and QDR Infiniband adapters.
As far as needing openib - the issue is that the PSM2 MTL doesn’t support a
subset of MPI operations that we previously used the pt2pt BTL for. For
Patrick, is your application multi-threaded? PSM2 was not originally designed
for multiple threads per process.
I do know that the OSU alltoallV test does pass when I try it.
Sent from my iPad
> On Jan 25, 2021, at 12:57 PM, Patrick Begou via users
> wrote:
>
> Hi Howard
What happens if you specify -mtl ofi ?
-Original Message-
From: users On Behalf Of Patrick Begou via
users
Sent: Monday, January 25, 2021 12:54 PM
To: users@lists.open-mpi.org
Cc: Patrick Begou
Subject: Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path
Hi Howard and Michael,
thanks
Patrick,
You really have to provide us some detailed information if you want assistance.
At a minimum we need to know if you're using the PSM2 MTL or the OFI MTL and
what the actual error is.
Please provide the actual command line you are having problems with, along with
any errors. In additio
Hi,
just tried 4.0.5rc1 and this is working as 4.0.3 (directly and via
slurm). So it is just 4.0.4 not working. Diffed Config and build.sh, but
couldn't find anything. I don't know why, but I'll accept it...
Regards,
Michael!
On 08/08/2020 18:46, Howard Pritchard wrote:
slurm support there is no need to
# specify the number of processes or a hostfile to mpirun.
/opt/openmpi/${OPENMPI}/gcc/bin/mpirun ${BIND_OPT} --mca
pmix_base_verbose 100 --debug-daemons
./OWnetbench/OWnetbench.openmpi-${OPENMPI}
done
On 08/08/2020 18:46, Howard Pritchard wrote:
Hello Mic
Hi,
I have a small setup with one headnode and two compute nodes connected
via IB-QDR running CentOS 8.2 and Mellanox OFED 4.9 LTS. I installed
openmpi 3.0.6, 3.1.6, 4.0.3 and 4.0.4 with identical configuration
(configure, compile, nothing configured in openmpi-mca-params.conf), the
output fr
That it! I was trying to remember what the setting was but I haven’t worked on
those HCAs since around 2012, so it was faint.
That said, I found the Intel TrueScale manual online at
https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/OFED_Host_Software_UserG
Prentice,
Avoiding the obvious question of whether your FM is running and the fabric is
in an active state, It sounds like your exhausting a resource on the cards.
Ralph is correct about support for QLogic cards being long past but I’ll see
what I can dig up in the archives on Monday to see if
d to be what Mellanox used to configure OpenMPI in HPC-X
> 2.5.
>
> I have users using GCC, PGI, Intel and AOCC compilers with this config. PGI
> was the only one that
> was a challenge to build due to conflicts with HCOLL.
>
> -Ray Muno
>
> On 2/7/20 10:04 AM, Michael Di
i haven't compiled openmpi in a while, but i'm in the process of
upgrading our cluster.
the last time i did this there were specific versions of mpi/pmix/ucx
that were all tested and supposed to work together. my understanding
of this was because pmi/ucx was under rapid development and the api's
btl_base_verbose may do what you need. Add it to your mpirun arguments. For
example:
[LINUX hds1fna2271 20200116_1404 mpi_apps]#
/usr/mpi/gcc/openmpi-3.1.6/bin/mpirun -np 2 -map-by node --allow-run-as-root
-machinefile /usr/src/opa/mpi_apps/mpi_hosts -mca btl self,openib,vader -mca
btl_base_ve
Emmanuel Thomé,
Thanks for bringing this to our attention. It turns out this issue affects all
OFI providers in open-mpi. We've applied a fix to the 3.0.x and later branches
of open-mpi/ompi on github. However, you should be aware that this fix simply
adds the appropriate error message, it does
unfortunately it takes a while to export the data, but here's what i see
On Mon, Mar 11, 2019 at 11:02 PM Gilles Gouaillardet wrote:
>
> Michael,
>
>
> this is odd, I will have a look.
>
> Can you confirm you are running on a single node ?
>
>
> At first, you
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet
wrote:
> You can force
> mpirun --mca pml ob1 ...
> And btl/vader (shared memory) will be used for intra node communications ...
> unless MPI tasks are from different jobs (read MPI_Comm_spawn())
if i run
mpirun -n 16 IMB-MPI1 alltoallv
thing
On Mon, Mar 11, 2019 at 12:19 PM Ralph H Castain wrote:
> OFI uses libpsm2 underneath it when omnipath detected
>
> > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet
> > wrote:
> > It might show that pml/cm and mtl/psm2 are used. In that case, then yes,
> > the OmniPath library is used even fo
On Mon, Mar 11, 2019 at 11:51 AM Ralph H Castain wrote:
> You are probably using the ofi mtl - could be psm2 uses loopback method?
according to ompi_info i do in fact have mtl's ofi,psm,psm2. i
haven't changed any of the defaults, so are you saying order to change
the behaviour i have to run mpi
i have a user that's claiming when two ranks on the same node want to
talk with each other, they're using the NIC to talk rather then just
talking directly.
i've never had to test such a scenario. is there a way for me to
prove one way or another whether two ranks are talking through say the
kern
s a typo in the v2.2.1 release. Sadly, our Slurm
> > plugin folks seem to be off somewhere for awhile and haven’t been testing
> > it. Sigh.
> >
> > I’ll patch the branch and let you know - we’d appreciate the feedback.
> > Ralph
> >
> >
> >> On
adding
>
> PMIX_MCA_pmix_client_event_verbose=5
> PMIX_MCA_pmix_server_event_verbose=5
> OMPI_MCA_pmix_base_verbose=10
>
> to your environment and see if that provides anything useful.
>
> > On Jan 18, 2019, at 12:09 PM, Michael Di Domenico
> > wrote:
> >
> > i compilie
i compilied pmix slurm openmpi
---pmix
./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13
--disable-debug
---slurm
./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13
--with-pmix=/hpc/pmix/2.2
---openmpi
./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external
--wit
On Mon, Nov 12, 2018 at 8:08 AM Andrei Berceanu
wrote:
>
> Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the following
> warnings:
>
> --
> WARNING: There is at least non-excluded one OpenFabrics device foun
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote:
>
> You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a
> switch), and install that
> on your system, or else install xpmem (https://github.com/hjelmn/xpmem).
> Note there is a bug right now
> in UCX that you may hit if
before i debug ucx further (cause it's totally not working for me), i
figured i'd check to see if it's *really* required to use shmem inside
of openmpi. i'm pretty sure the answer is yes, but i wanted to double
check.
___
users mailing list
users@lists.o
On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.org wrote:
> Looks like the problem is that you didn’t wind up with the external PMIx. The
> component listed in your error is the internal PMIx one which shouldn’t have
> built given that configure line.
>
> Check your config.out and see what happe
i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix.
everything compiled, but when i run something it get
: symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol:
opal_libevent2022_evthread_use_pthreads
i more then sure i did something wrong, but i'm not sure what, h
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres)
wrote:
> On Apr 6, 2018, at 8:12 AM, Michael Di Domenico
> wrote:
>> it would be nice if openmpi had (or may already have) a simple switch
>> that lets me disable entire portions of the library chain, ie this
>
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet
wrote:
> That being said, the error suggest mca_oob_ud.so is a module from a
> previous install,
> Open MPI was not built on the system it is running, or libibverbs.so.1
> has been removed after
> Open MPI was built.
yes, understood, i compiled
i'm trying to compile openmpi to support all of our interconnects,
psm/openib/mxm/etc
this works fine, openmpi finds all the libs, compiles and runs on each
of the respective machines
however, we don't install the libraries for everything everywhere
so when i run things like ompi_info and mpirun
OK,
Thanks for your help.
Mike...
On 02/26/2018 05:07 PM, Marco Atzeri wrote:
> On 26/02/2018 22:57, Michael A. Saverino wrote:
>>
>> Marco,
>>
>> If you disable the loopback as well as the other adapters via Device
>> Manager, you should be able to reproduc
Marco,
If you disable the loopback as well as the other adapters via Device
Manager, you should be able to reproduce the error.
Mike...
On 02/26/2018 04:51 PM, Marco Atzeri wrote:
> On 26/02/2018 22:10, Michael A. Saverino wrote:
>>
>> Marco,
>>
>> I think oob still
answer Windows firewall questions (if enabled) permitting/not
permitting orterun and my application. Do you have the Microsoft
Loopback adapter installed on your system?
Many Thanks,
Mike...
On 02/26/2018 02:11 PM, Marco Atzeri wrote:
> On 26/02/2018 18:14, Michael A. Saverino wrote:
>>
s other than
> shared memory - note that you always must enable the “self” btl.
>
> Second, you likely also need to ensure that the OOB isn’t trying to use tcp,
> so add “-mca oob ^tcp” to your cmd line. It shouldn’t be active anyway, but
> better safe.
>
>
>> On Feb 26
following qualifiers in my OMPI command to no avail:
--mca btl ^tcp,self,sm
So the question is, am I able to disable TCP networking, either via
command line or code, if I only plan to use cores on a single machine
for OMPI execution?
Many Thanks,
Mike...
--
Michael A.Saverino
Contractor
openmpi-2.0.2 running on rhel 7.4 with qlogic QDR infiniband
switches/adapters, also using slurm
i have a user that's running a job over multiple days. unfortunately
after a few days at random the job will seemingly hang. the latest
instance was caused by an infiniband adapter that went offline
Maybe you have an idea why it didn't work with those private variables? But
- well, if not there would not be a problem any more (although I don't know
why). ;)
Best regards
Michael
______
Dipl.-Ing. Michael Mauersberger
michael.
ered a similar problem and is able to help me. I
would be really grateful.
Thanks,
Michael
___
Dipl.-Ing. Michael Mauersberger<mailto:michael.mauersber...@tu-dresden.de>
Tel. +49 351 463 38099 | Fax +49 351 463 37263
Marschnerstraße 30,
my cluster nodes are connected on 1g ethernet eth0/eth1 and via
infiniband rdma and ib0
my understanding is that openmpi will detect all these interfaces.
using eth0/eth1 for connection setup and use rdma for msg passing
what would be an appropriate to command line parameters to tell
openmpi to i
i'm getting stuck trying to run some fairly large IMB-MPI alltoall
tests under openmpi 2.0.2 on rhel 7.4
i have two different clusters, one running mellanox fdr10 and one
running qlogic qdr
if i issue
mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv
the job just stalls after t
This discussion started getting into an interesting question: ABI
standardization for portability by language. It makes sense to have ABI
standardization for portability of objects across environments. At the same
time it does mean that everyone follows the exact same recipe for low level
implement
OMP is yet another source of incompatibility between GNU and Intel
environments. So compiling say Fortran OMP code into a library and trying
to link it with Intel Fortran codes just aggravates the problem.
Michael
On Mon, Sep 18, 2017 at 7:35 PM, Gilles Gouaillardet <
gilles.gouail
different
compilation environments.
Thank you,
Michael
On Mon, Sep 18, 2017 at 7:35 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:
> Even if i do not fully understand the question, keep in mind Open MPI
> does not use OpenMP, so from that point of view, Open MPI is
>
Thanks for the note. How about OMP runtimes though?
Michael
On Mon, Sep 18, 2017 at 3:21 PM, n8tm via users
wrote:
> On Linux and Mac, Intel c and c++ are sufficiently compatible with gcc and
> g++ that this should be possible. This is not so for Fortran libraries or
>
OpenMPI compiler wrappers to use the Intel
compiler set? Would there be any issues with compiling C++ / Fortran or
corresponding OMP codes ?
In general, what is clean way to build OpenMPI with a GNU compiler set but
then instruct the wrappers to use Intel compiler set?
Thanks!
Michael
On Thu, Jun 22, 2017 at 12:41 PM, r...@open-mpi.org wrote:
> I gather you are using OMPI 2.x, yes? And you configured it
> --with-pmi=, then moved the executables/libs to your
> workstation?
correct
> I suppose I could state the obvious and say “don’t do that - just rebuild it”
correct... bu
On Thu, Jun 22, 2017 at 10:43 AM, John Hearns via users
wrote:
> Having had some problems with ssh launching (a few minutes ago) I can
> confirm that this works:
>
> --mca plm_rsh_agent "ssh -v"
this doesn't do anything for me
if i set OMPI_MCA_sec=^munge
i can clear the mca_sec_munge error
bu
35 AM, r...@open-mpi.org wrote:
> You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment
>
>
> On Jun 22, 2017, at 7:28 AM, John Hearns via users
> wrote:
>
> Michael, try
> --mca plm_rsh_agent ssh
>
> I've been fooling with this myself rec
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
command line or (better) using environment variables?
i'd like to use the installed version of openmpi i have on a
workstation, but it's linked with slurm from one of my clusters.
mpi/slurm work just fine on the cluster, but when i
On Mon, Jul 25, 2016 at 4:53 AM, Gilles Gouaillardet wrote:
>
> as a workaround, you can configure without -noswitcherror.
>
> after you ran configure, you have to manually patch the generated 'libtool'
> file and add the line with pgcc*) and the next line like this :
>
> /* if pgcc is used, libto
pthread" from libslurm.la and libpmi.la
>>
>> On 07/11/2016 02:54 PM, Michael Di Domenico wrote:
>>>
>>> I'm trying to get openmpi compiled using the PGI compiler.
>>>
>>> the configure goes through and the code starts to compile, but the
On Mon, Jul 11, 2016 at 9:52 AM, Åke Sandgren wrote:
> Looks like you are compiling with slurm support.
>
> If so, you need to remove the "-pthread" from libslurm.la and libpmi.la
i don't see a configure option in slurm to disable pthreads, so i'm
not sure this is possible.
On Thu, Jul 14, 2016 at 9:47 AM, Michael Di Domenico
wrote:
> Have 1.10.3 unpacked, ran through the configure using the same command
> line options as 1.10.2
>
> but it fails even earlier in the make process at
>
> Entering openmpi-1.10.3/opal/asm
> CPPAS atomic-asm.lo
>
cense for the pgCC C++ compiler ?
> fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not work
> together out of the box, hopefully I will have a fix ready sometimes this
> week
>
> Cheers,
>
> Gilles
>
>
> On Monday, July 11, 2016, Michael Di Domenico
&
On Mon, Jul 11, 2016 at 9:11 AM, Gilles Gouaillardet
wrote:
> Can you try the latest 1.10.3 instead ?
i can but it'll take a few days to pull the software inside.
> btw, do you have a license for the pgCC C++ compiler ?
> fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not wor
I'm trying to get openmpi compiled using the PGI compiler.
the configure goes through and the code starts to compile, but then
gets hung up with
entering: openmpi-1.10.2/opal/mca/common/pmi
CC common_pmi.lo
CCLD libmca_common_pmi.la
pgcc-Error-Unknown switch: - pthread
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A
wrote:
> I was looking for lines like" [nodexyz:17085] selected cm best priority 40"
> and " [nodexyz:17099] select: component psm selected"
this may have turned up more then i expected. i recompiled openmpi
v1.8.4 as a test and reran the test
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres)
wrote:
> Can you send all the information listed here?
>
> https://www.open-mpi.org/community/help/
>
> (including the full output from the run with the PML/BTL/MTL/etc. verbosity)
>
> This will allow Matias to look through all the rele
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A
wrote:
> I was looking for lines like" [nodexyz:17085] selected cm best priority 40"
> and " [nodexyz:17099] select: component psm selected"
i see cm best priority 20, which seems to relate to ob1 being
selected. i don't see a mention of psm a
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A
wrote:
> I didn't go into the code to see who is actually calling this error message,
> but I suspect this may be a generic error for "out of memory" kind of thing
> and not specific to the que pair. To confirm please add -mca
> pml_base_verbos
On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias A
wrote:
> Hi Michael,
>
> I may be missing some context, if you are using the qlogic cards you will
> always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl.
> As Tom suggest, confirm the limits are setu
On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote:
> Hi Mike,
>
> In this file,
> $ cat /etc/security/limits.conf
> ...
> < do you see at the end ... >
>
> * hard memlock unlimited
> * soft memlock unlimited
> # -- All InfiniBand Settings End here --
> ?
Yes. I double checked that it's set on a
On Thu, Mar 10, 2016 at 11:54 AM, Michael Di Domenico
wrote:
> when i try to run an openmpi job with >128 ranks (16 ranks per node)
> using alltoall or alltoallv, i'm getting an error that the process was
> unable to get a queue pair.
>
> i've checked the max lock
when i try to run an openmpi job with >128 ranks (16 ranks per node)
using alltoall or alltoallv, i'm getting an error that the process was
unable to get a queue pair.
i've checked the max locked memory settings across my machines;
using ulimit -l in and outside of mpirun and they're all set to u
and that I am not able to track down.
Sorry for having wasted your collective time on this; if this error should
arise again, I will try to get a proper Valgrind report with -enable-debug and
report it here.
Michael
> On 30 Jul 2015, at 22:10 , Nathan Hjelm wrote:
>
>
> I agre
Hi Ralph,
Thanks a lot for the fast reply and the clarification. We’ve re-added the
parameter to our MCA site configuration file.
Michael
On 14 Aug 2015, at 15:00 , Ralph Castain
mailto:r...@open-mpi.org>> wrote:
During the 1.7 series, we changed things at the request of system adm
r a feature? We recently upgraded from 1.6.x
to 1.8.7, and as far as I remember, in 1.6.x oversubscription was enabled by
default.
Regards,
Michael
P.S.: In ompi_info, both rmaps_base_no_oversubscribe and
rmaps_base_oversubscribe are reported as “false”. Our
prefix/etc/openmpi-mca-params.conf file is empty.
If it is helpful, I can try to compile OpenMPI with debug information and get
more details on the reported error. However, it would be good if someone could
tell me the necessary compile flags (on top of -O0 -g) and it would take me
probably 1-2 weeks to do it.
Michael
Original
Hi Ralph,
That’s what I suspected. Thank you for your confirmation.
Michael
On 25 Jul 2015, at 16:10 , Ralph Castain
mailto:r...@open-mpi.org>> wrote:
Looks to me like a false positive - we do malloc some space, and do access
different parts of it. However, it looks like we are insi
"io_ompio_delete_priority" (current value:
"10", data source: default, level: 9 dev/all, type: int)
So it seems we are indeed using ROMIO. Any suggestions what that means with
respect to our file coherence issue?
Regards,
Michael
On 23 Jul 2015, at 14:07 , Gilles Gouaillardet
Gilles (see
other mail in thread) suggested, I am not sure whether we use romio or ompio,
but I do not know how to find out.
Michael
eproduce the issue with it.
Sorry for not being more helpful, but we are also scratching our heads trying
to understand what is going on and I just thought that maybe someone here has
had a similar experience in the past (or might give us some pointers at what to
look at).
Regards,
Michael
the job nodes using the -machinefile flag).
Has anyone encountered something similar or do you have an idea what I could do
to track down the problem?
Regards,
Michael
--
Michael Schlottke-Lakemper
SimLab Highly Scalable Fluids & Solids Engineering
Jülich Aachen Research Alliance (JARA
g has this error.
Has anyone seen this or might be able to offer an explanation? If it is a
false-positive, I’d be happy to suppress it :)
Thanks a lot in advance
Michael
P.S.: This error is not covered/suppressed by the default ompi suppression file
in $PREFIX/share/openmpi.
--
Michael Schl
I'm trying to get slurm and openmpi to cooperate when running multi
thread jobs. i'm sure i'm doing something wrong, but i can't figure
out what
my node configuration is
2 nodes
2 sockets
6 cores per socket
i want to run
sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2
prog
count gets over a
certain point?
thanks
On Wed, Nov 5, 2014 at 5:51 PM, Friedley, Andrew
wrote:
> Hi Michael,
>
> From what I understand, this is an issue with the qib driver and PSM from
> RHEL 6.5 and 6.6, and will be fixed for 6.7. There is no functional change
> between qib
I'm getting the below message on my cluster(s). It seems to only
happen when I try to use more then 64 nodes (16-cores each). The
clusters are running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM.
I'm using the OFED versions included with RHEL for infiniband support.
ipath_userinit: Mismatched
1 - 100 of 389 matches
Mail list logo