Re: [OMPI users] Quality and details of implementation for Neighborhood collective operations

2022-06-08 Thread Michael Thomadakis via users
I see, thanks Is there any plan to apply any optimizations on the Neighbor collectives at some point? regards Michael On Wed, Jun 8, 2022 at 1:29 PM George Bosilca wrote: > Michael, > > As far as I know none of the implementations of the > neighborhood collectives in OMPI are

[OMPI users] Quality and details of implementation for Neighborhood collective operations

2022-06-08 Thread Michael Thomadakis via users
ot; way to provide optimized neighborhood collectives? Thanks you much Michael

Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-03 Thread Michael Di Domenico via users
perhaps there is > different initialization that happens such that the offending device search > problem doesn't occur? > > > Thanks, > > David > > > > > From: Shrader, David Lee > Sent: Tuesday, November 2, 2021 2:09 P

[OMPI users] strange pml error

2021-11-02 Thread Michael Di Domenico via users
fairly frequently, but not everytime when trying to run xhpl on a new machine i'm bumping into this. it happens with a single node or multiple nodes node1 selected pml ob1, but peer on node1 selected pml ucx if i rerun the exact same command a few minutes later, it works fine. the machine is new

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-27 Thread Heinz, Michael William via users
with each other? Another idea that came to mind was to get an OpenMPI build that would not have any high performance fabric support and would only work via TCP. So any advice on how to accomplish my goal would be appreciated. I realize that performance-wise that is going to be quite... sad. But

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
Wednesday, May 19, 2021 11:31 AM To: Open MPI Users Cc: Heinz, Michael William Subject: Re: [OMPI users] unable to launch a job on a system with OmniPath Just some more data from my OminPath based cluster. There certainly was a change from 4.0.x to 4.1.x With 4.0.1 I woud build openmpi with .

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
do it. However, note that the format of the string must be 16 hex digits, a hyphen, then 16 more hex digits. anything else will be rejected. Also, I have never tried doing this, YMMV. From: Heinz, Michael William Sent: Wednesday, May 19, 2021 10:35 AM To: Open MPI Users Cc: Ralph Castain Subj

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
nta Fe, ARGENTINA. Tel +54-342-4511594/95 ext 7062, fax: +54-342-4511169 What am I missing and how can I improve the performance? Regards, Pavel Mezentsev. On Mon, May 10, 2021 at 6:20 PM Heinz, Michael William < michael.william.he...@cornelisnetworks.com<mailto:michael.william.he.

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-10 Thread Heinz, Michael William via users
That warning is an annoying bit of cruft from the openib / verbs provider that can be ignored. (Actually, I recommend using "-btl ^openib" to suppress the warning.) That said, there is a known issue with selecting PSM2 and OMPI 4.1.0. I'm not sure that that's the problem you're hitting, though,

Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Heinz, Michael William via users
ds... By the way, have you looked at using Easybuild? Would be good to have your input there maybe. On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users mailto:users@lists.open-mpi.org>> wrote: I’m having a heck of a time building OMPI with Intel C. Compilation goes fine, ins

Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Heinz, Michael William via users
Giles, I’ll double check - but the intel runtime is installed on all machines in the fabric. - Michael Heinz michael.william.he...@cornelisnetworks.com<mailto:michael.william.he...@cornelisnetworks.com> On Apr 7, 2021, at 2:42 AM, Gilles Gouaillardet via users mailto:users@list

[OMPI users] Building Open-MPI with Intel C

2021-04-06 Thread Heinz, Michael William via users
rs_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x7fdaa23e1000) /lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000) Can anyone suggest what I'm forgetting to do? --- Michael Heinz Fabric Software Engineer, Cornelis Networks

Re: [OMPI users] Newbie With Issues

2021-03-30 Thread Michael Fuckner via users
/intel/oneapi/compiler/2021.2.0/linux/bin Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/10 Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/10 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64 Regards, Michael! > bend linux4ms.net

Re: [OMPI users] Newbie With Issues

2021-03-30 Thread Heinz, Michael William via users
It looks like you're trying to build Open MPI with the Intel C compiler. TBH - I think that icc isn't included with the latest release of oneAPI, I think they've switched to including clang instead. I had a similar issue to yours but I resolved it by installing a 2020 version of the Intel HPC so

Re: [OMPI users] [EXTERNAL] building openshem on opa

2021-03-22 Thread Michael Di Domenico via users
On Mon, Mar 22, 2021 at 11:13 AM Pritchard Jr., Howard wrote: > https://github.com/Sandia-OpenSHMEM/SOS > if you want to use OpenSHMEM over OPA. > If you have lots of cycles for development work, you could write an OFI SPML > for the OSHMEM component of Open MPI. thanks, i am aware of the sandi

[OMPI users] building openshem on opa

2021-03-22 Thread Michael Di Domenico via users
i can build and run openmpi on an opa network just fine, but it turns out building openshmem fails. the message is (no spml) found looking at the config log it looks like it tries to build spml ikrit and ucx which fail. i turn ucx off because it doesn't support opa and isn't needed. so this mes

Re: [OMPI users] Error intialising an OpenFabrics device.

2021-03-13 Thread Heinz, Michael William via users
I’ve begun getting this annoyingly generic warning, too. It appears to be coming from the openib provider. If you disable it with -mtl ^openib the warning goes away. Sent from my iPad > On Mar 13, 2021, at 3:28 PM, Bob Beattie via users > wrote: > > Hi everyone, > > To be honest, as an MPI

Re: [OMPI users] Stable and performant openMPI version for Ubuntu20.04 ?

2021-03-04 Thread Heinz, Michael William via users
What interconnect are you using at run time? That is, are you using Ethernet or InfiniBand or Omnipath? Sent from my iPad On Mar 4, 2021, at 5:05 AM, Raut, S Biplab via users wrote:  [AMD Official Use Only - Internal Distribution Only] After downloading a particular openMPI version, let’s

[OMPI users] Unexpected issue with 4.1.x build

2021-03-02 Thread Heinz, Michael William via users
this might be happening? I do not see this with OMPI 4.0.3. --- Michael Heinz Fabric Software Engineer, Cornelis Networks

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-28 Thread Heinz, Michael William via users
Patrick, A few more questions for you: 1. What version of IFS are you running? 2. Are you using CUDA cards by any chance? If so, what version of CUDA? -Original Message- From: Heinz, Michael William Sent: Wednesday, January 27, 2021 3:45 PM To: Open MPI Users Subject: RE: [OMPI users

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Heinz, Michael William via users
Patrick, Do you have any PSM2_* or HFI_* environment variables defined in your run time environment that could be affecting things? -Original Message- From: users On Behalf Of Heinz, Michael William via users Sent: Wednesday, January 27, 2021 3:37 PM To: Open MPI Users Cc: Heinz

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Heinz, Michael William via users
Unfortunately, OPA/PSM support for Debian isn't handled by Intel directly or by Cornelis Networks - but I should point out you can download the latest official source for PSM2 and the drivers from Github. -Original Message- From: users On Behalf Of Michael Di Domenico via users

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Michael Di Domenico via users
tible with PSM and OPA when running specifically on debian (likely due to library versioning). i don't know how common that is, so it's not clear how flushed out and tested it is On Wed, Jan 27, 2021 at 3:07 PM Patrick Begou via users wrote: > > Hi Howard and Michael > > first man

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Michael Di Domenico via users
2021 at 3:44 PM Patrick Begou via users wrote: > > Hi Michael > > indeed I'm a little bit lost with all these parameters in OpenMPI, mainly > because for years it works just fine out of the box in all my deployments on > various architectures, interconnects and linux flavor. S

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Heinz, Michael William via users
Patrick how are you using original PSM if you’re using Omni-Path hardware? The original PSM was written for QLogic DDR and QDR Infiniband adapters. As far as needing openib - the issue is that the PSM2 MTL doesn’t support a subset of MPI operations that we previously used the pt2pt BTL for. For

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
Patrick, is your application multi-threaded? PSM2 was not originally designed for multiple threads per process. I do know that the OSU alltoallV test does pass when I try it. Sent from my iPad > On Jan 25, 2021, at 12:57 PM, Patrick Begou via users > wrote: > > Hi Howard

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
What happens if you specify -mtl ofi ? -Original Message- From: users On Behalf Of Patrick Begou via users Sent: Monday, January 25, 2021 12:54 PM To: users@lists.open-mpi.org Cc: Patrick Begou Subject: Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path Hi Howard and Michael, thanks

[OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
Patrick, You really have to provide us some detailed information if you want assistance. At a minimum we need to know if you're using the PSM2 MTL or the OFI MTL and what the actual error is. Please provide the actual command line you are having problems with, along with any errors. In additio

Re: [OMPI users] Differences 4.0.3 -> 4.0.4 (Regression?)

2020-08-10 Thread Michael Fuckner via users
Hi, just tried 4.0.5rc1 and this is working as 4.0.3 (directly and via slurm). So it is just 4.0.4 not working. Diffed Config and build.sh, but couldn't find anything. I don't know why, but I'll accept it... Regards, Michael! On 08/08/2020 18:46, Howard Pritchard wrote:

Re: [OMPI users] Differences 4.0.3 -> 4.0.4 (Regression?)

2020-08-08 Thread Michael Fuckner via users
slurm support there is no need to # specify the number of processes or a hostfile to mpirun. /opt/openmpi/${OPENMPI}/gcc/bin/mpirun ${BIND_OPT} --mca pmix_base_verbose 100 --debug-daemons ./OWnetbench/OWnetbench.openmpi-${OPENMPI} done On 08/08/2020 18:46, Howard Pritchard wrote: Hello Mic

[OMPI users] Differences 4.0.3 -> 4.0.4 (Regression?)

2020-08-06 Thread Michael Fuckner via users
Hi, I have a small setup with one headnode and two compute nodes connected via IB-QDR running CentOS 8.2 and Mellanox OFED 4.9 LTS. I installed openmpi 3.0.6, 3.1.6, 4.0.3 and 4.0.4 with identical configuration (configure, compile, nothing configured in openmpi-mca-params.conf), the output fr

Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Heinz, Michael William via users
That it! I was trying to remember what the setting was but I haven’t worked on those HCAs since around 2012, so it was faint. That said, I found the Intel TrueScale manual online at https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/OFED_Host_Software_UserG

[OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Heinz, Michael William via users
Prentice, Avoiding the obvious question of whether your FM is running and the fabric is in an active state, It sounds like your exhausting a resource on the cards. Ralph is correct about support for QLogic cards being long past but I’ll see what I can dig up in the archives on Monday to see if

Re: [OMPI users] openmpi/pmix/ucx

2020-02-07 Thread Michael Di Domenico via users
d to be what Mellanox used to configure OpenMPI in HPC-X > 2.5. > > I have users using GCC, PGI, Intel and AOCC compilers with this config. PGI > was the only one that > was a challenge to build due to conflicts with HCOLL. > > -Ray Muno > > On 2/7/20 10:04 AM, Michael Di

[OMPI users] openmpi/pmix/ucx

2020-02-07 Thread Michael Di Domenico via users
i haven't compiled openmpi in a while, but i'm in the process of upgrading our cluster. the last time i did this there were specific versions of mpi/pmix/ucx that were all tested and supposed to work together. my understanding of this was because pmi/ucx was under rapid development and the api's

[OMPI users] Subject: need a tool and its use to verify use of infiniband network

2020-01-16 Thread Heinz, Michael William via users
btl_base_verbose may do what you need. Add it to your mpirun arguments. For example: [LINUX hds1fna2271 20200116_1404 mpi_apps]# /usr/mpi/gcc/openmpi-3.1.6/bin/mpirun -np 2 -map-by node --allow-run-as-root -machinefile /usr/src/opa/mpi_apps/mpi_hosts -mca btl self,openib,vader -mca btl_base_ve

Re: [OMPI users] silent failure for large allgather

2019-09-25 Thread Heinz, Michael William via users
Emmanuel Thomé, Thanks for bringing this to our attention. It turns out this issue affects all OFI providers in open-mpi. We've applied a fix to the 3.0.x and later branches of open-mpi/ompi on github. However, you should be aware that this fix simply adds the appropriate error message, it does

Re: [OMPI users] local rank to rank comms

2019-03-20 Thread Michael Di Domenico
unfortunately it takes a while to export the data, but here's what i see On Mon, Mar 11, 2019 at 11:02 PM Gilles Gouaillardet wrote: > > Michael, > > > this is odd, I will have a look. > > Can you confirm you are running on a single node ? > > > At first, you

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet wrote: > You can force > mpirun --mca pml ob1 ... > And btl/vader (shared memory) will be used for intra node communications ... > unless MPI tasks are from different jobs (read MPI_Comm_spawn()) if i run mpirun -n 16 IMB-MPI1 alltoallv thing

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 12:19 PM Ralph H Castain wrote: > OFI uses libpsm2 underneath it when omnipath detected > > > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet > > wrote: > > It might show that pml/cm and mtl/psm2 are used. In that case, then yes, > > the OmniPath library is used even fo

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 11:51 AM Ralph H Castain wrote: > You are probably using the ofi mtl - could be psm2 uses loopback method? according to ompi_info i do in fact have mtl's ofi,psm,psm2. i haven't changed any of the defaults, so are you saying order to change the behaviour i have to run mpi

[OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
i have a user that's claiming when two ranks on the same node want to talk with each other, they're using the NIC to talk rather then just talking directly. i've never had to test such a scenario. is there a way for me to prove one way or another whether two ranks are talking through say the kern

Re: [OMPI users] pmix and srun

2019-01-18 Thread Michael Di Domenico
s a typo in the v2.2.1 release. Sadly, our Slurm > > plugin folks seem to be off somewhere for awhile and haven’t been testing > > it. Sigh. > > > > I’ll patch the branch and let you know - we’d appreciate the feedback. > > Ralph > > > > > >> On

Re: [OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
adding > > PMIX_MCA_pmix_client_event_verbose=5 > PMIX_MCA_pmix_server_event_verbose=5 > OMPI_MCA_pmix_base_verbose=10 > > to your environment and see if that provides anything useful. > > > On Jan 18, 2019, at 12:09 PM, Michael Di Domenico > > wrote: > > > > i compilie

[OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
i compilied pmix slurm openmpi ---pmix ./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13 --disable-debug ---slurm ./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13 --with-pmix=/hpc/pmix/2.2 ---openmpi ./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external --wit

Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Michael Di Domenico
On Mon, Nov 12, 2018 at 8:08 AM Andrei Berceanu wrote: > > Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the following > warnings: > > -- > WARNING: There is at least non-excluded one OpenFabrics device foun

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-14 Thread Michael Di Domenico
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote: > > You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a > switch), and install that > on your system, or else install xpmem (https://github.com/hjelmn/xpmem). > Note there is a bug right now > in UCX that you may hit if

[OMPI users] shmem

2018-05-09 Thread Michael Di Domenico
before i debug ucx further (cause it's totally not working for me), i figured i'd check to see if it's *really* required to use shmem inside of openmpi. i'm pretty sure the answer is yes, but i wanted to double check. ___ users mailing list users@lists.o

Re: [OMPI users] openmpi/slurm/pmix

2018-04-25 Thread Michael Di Domenico
On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.org wrote: > Looks like the problem is that you didn’t wind up with the external PMIx. The > component listed in your error is the internal PMIx one which shouldn’t have > built given that configure line. > > Check your config.out and see what happe

[OMPI users] openmpi/slurm/pmix

2018-04-23 Thread Michael Di Domenico
i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix. everything compiled, but when i run something it get : symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol: opal_libevent2022_evthread_use_pthreads i more then sure i did something wrong, but i'm not sure what, h

Re: [OMPI users] disabling libraries?

2018-04-10 Thread Michael Di Domenico
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres) wrote: > On Apr 6, 2018, at 8:12 AM, Michael Di Domenico > wrote: >> it would be nice if openmpi had (or may already have) a simple switch >> that lets me disable entire portions of the library chain, ie this >

Re: [OMPI users] disabling libraries?

2018-04-06 Thread Michael Di Domenico
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet wrote: > That being said, the error suggest mca_oob_ud.so is a module from a > previous install, > Open MPI was not built on the system it is running, or libibverbs.so.1 > has been removed after > Open MPI was built. yes, understood, i compiled

[OMPI users] disabling libraries?

2018-04-05 Thread Michael Di Domenico
i'm trying to compile openmpi to support all of our interconnects, psm/openib/mxm/etc this works fine, openmpi finds all the libs, compiles and runs on each of the respective machines however, we don't install the libraries for everything everywhere so when i run things like ompi_info and mpirun

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
OK, Thanks for your help. Mike... On 02/26/2018 05:07 PM, Marco Atzeri wrote: > On 26/02/2018 22:57, Michael A. Saverino wrote: >> >> Marco, >> >> If you disable the loopback as well as the other adapters via Device >> Manager, you should be able to reproduc

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
Marco, If you disable the loopback as well as the other adapters via Device Manager, you should be able to reproduce the error. Mike... On 02/26/2018 04:51 PM, Marco Atzeri wrote: > On 26/02/2018 22:10, Michael A. Saverino wrote: >> >> Marco, >> >> I think oob still

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
answer Windows firewall questions (if enabled) permitting/not permitting orterun and my application.  Do you have the Microsoft Loopback adapter installed on your system? Many Thanks, Mike... On 02/26/2018 02:11 PM, Marco Atzeri wrote: > On 26/02/2018 18:14, Michael A. Saverino wrote: >>

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
s other than > shared memory - note that you always must enable the “self” btl. > > Second, you likely also need to ensure that the OOB isn’t trying to use tcp, > so add “-mca oob ^tcp” to your cmd line. It shouldn’t be active anyway, but > better safe. > > >> On Feb 26

[OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
following qualifiers in my OMPI command to no avail: --mca btl ^tcp,self,sm So the question is, am I able to disable TCP networking, either via command line or code, if I only plan to use cores on a single machine for OMPI execution? Many Thanks, Mike... -- Michael A.Saverino Contractor

[OMPI users] openmpi hang on IB disconnect

2018-01-17 Thread Michael Di Domenico
openmpi-2.0.2 running on rhel 7.4 with qlogic QDR infiniband switches/adapters, also using slurm i have a user that's running a job over multiple days. unfortunately after a few days at random the job will seemingly hang. the latest instance was caused by an infiniband adapter that went offline

Re: [OMPI users] Vague error message while executing MPI-Fortran program

2017-11-05 Thread Michael Mauersberger
Maybe you have an idea why it didn't work with those private variables? But - well, if not there would not be a problem any more (although I don't know why). ;) Best regards Michael ______ Dipl.-Ing. Michael Mauersberger michael.

[OMPI users] Vague error message while executing MPI-Fortran program

2017-10-24 Thread Michael Mauersberger
ered a similar problem and is able to help me. I would be really grateful. Thanks, Michael ___ Dipl.-Ing. Michael Mauersberger<mailto:michael.mauersber...@tu-dresden.de> Tel. +49 351 463 38099 | Fax +49 351 463 37263 Marschnerstraße 30,

[OMPI users] openmpi mgmt traffic

2017-10-11 Thread Michael Di Domenico
my cluster nodes are connected on 1g ethernet eth0/eth1 and via infiniband rdma and ib0 my understanding is that openmpi will detect all these interfaces. using eth0/eth1 for connection setup and use rdma for msg passing what would be an appropriate to command line parameters to tell openmpi to i

[OMPI users] alltoallv

2017-10-10 Thread Michael Di Domenico
i'm getting stuck trying to run some fairly large IMB-MPI alltoall tests under openmpi 2.0.2 on rhel 7.4 i have two different clusters, one running mellanox fdr10 and one running qlogic qdr if i issue mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv the job just stalls after t

Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-20 Thread Michael Thomadakis
This discussion started getting into an interesting question: ABI standardization for portability by language. It makes sense to have ABI standardization for portability of objects across environments. At the same time it does mean that everyone follows the exact same recipe for low level implement

Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-18 Thread Michael Thomadakis
OMP is yet another source of incompatibility between GNU and Intel environments. So compiling say Fortran OMP code into a library and trying to link it with Intel Fortran codes just aggravates the problem. Michael On Mon, Sep 18, 2017 at 7:35 PM, Gilles Gouaillardet < gilles.gouail

Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-18 Thread Michael Thomadakis
different compilation environments. Thank you, Michael On Mon, Sep 18, 2017 at 7:35 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Even if i do not fully understand the question, keep in mind Open MPI > does not use OpenMP, so from that point of view, Open MPI is >

Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-18 Thread Michael Thomadakis
Thanks for the note. How about OMP runtimes though? Michael On Mon, Sep 18, 2017 at 3:21 PM, n8tm via users wrote: > On Linux and Mac, Intel c and c++ are sufficiently compatible with gcc and > g++ that this should be possible. This is not so for Fortran libraries or >

[OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-18 Thread Michael Thomadakis
OpenMPI compiler wrappers to use the Intel compiler set? Would there be any issues with compiling C++ / Fortran or corresponding OMP codes ? In general, what is clean way to build OpenMPI with a GNU compiler set but then instruct the wrappers to use Intel compiler set? Thanks! Michael

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-23 Thread Michael Di Domenico
On Thu, Jun 22, 2017 at 12:41 PM, r...@open-mpi.org wrote: > I gather you are using OMPI 2.x, yes? And you configured it > --with-pmi=, then moved the executables/libs to your > workstation? correct > I suppose I could state the obvious and say “don’t do that - just rebuild it” correct... bu

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
On Thu, Jun 22, 2017 at 10:43 AM, John Hearns via users wrote: > Having had some problems with ssh launching (a few minutes ago) I can > confirm that this works: > > --mca plm_rsh_agent "ssh -v" this doesn't do anything for me if i set OMPI_MCA_sec=^munge i can clear the mca_sec_munge error bu

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
35 AM, r...@open-mpi.org wrote: > You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment > > > On Jun 22, 2017, at 7:28 AM, John Hearns via users > wrote: > > Michael, try > --mca plm_rsh_agent ssh > > I've been fooling with this myself rec

[OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun command line or (better) using environment variables? i'd like to use the installed version of openmpi i have on a workstation, but it's linked with slurm from one of my clusters. mpi/slurm work just fine on the cluster, but when i

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-25 Thread Michael Di Domenico
On Mon, Jul 25, 2016 at 4:53 AM, Gilles Gouaillardet wrote: > > as a workaround, you can configure without -noswitcherror. > > after you ran configure, you have to manually patch the generated 'libtool' > file and add the line with pgcc*) and the next line like this : > > /* if pgcc is used, libto

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-22 Thread Michael Di Domenico
pthread" from libslurm.la and libpmi.la >> >> On 07/11/2016 02:54 PM, Michael Di Domenico wrote: >>> >>> I'm trying to get openmpi compiled using the PGI compiler. >>> >>> the configure goes through and the code starts to compile, but the

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
On Mon, Jul 11, 2016 at 9:52 AM, Åke Sandgren wrote: > Looks like you are compiling with slurm support. > > If so, you need to remove the "-pthread" from libslurm.la and libpmi.la i don't see a configure option in slurm to disable pthreads, so i'm not sure this is possible.

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
On Thu, Jul 14, 2016 at 9:47 AM, Michael Di Domenico wrote: > Have 1.10.3 unpacked, ran through the configure using the same command > line options as 1.10.2 > > but it fails even earlier in the make process at > > Entering openmpi-1.10.3/opal/asm > CPPAS atomic-asm.lo >

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
cense for the pgCC C++ compiler ? > fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not work > together out of the box, hopefully I will have a fix ready sometimes this > week > > Cheers, > > Gilles > > > On Monday, July 11, 2016, Michael Di Domenico &

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-11 Thread Michael Di Domenico
On Mon, Jul 11, 2016 at 9:11 AM, Gilles Gouaillardet wrote: > Can you try the latest 1.10.3 instead ? i can but it'll take a few days to pull the software inside. > btw, do you have a license for the pgCC C++ compiler ? > fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not wor

[OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-11 Thread Michael Di Domenico
I'm trying to get openmpi compiled using the PGI compiler. the configure goes through and the code starts to compile, but then gets hung up with entering: openmpi-1.10.2/opal/mca/common/pmi CC common_pmi.lo CCLD libmca_common_pmi.la pgcc-Error-Unknown switch: - pthread

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" this may have turned up more then i expected. i recompiled openmpi v1.8.4 as a test and reran the test

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres) wrote: > Can you send all the information listed here? > > https://www.open-mpi.org/community/help/ > > (including the full output from the run with the PML/BTL/MTL/etc. verbosity) > > This will allow Matias to look through all the rele

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" i see cm best priority 20, which seems to relate to ob1 being selected. i don't see a mention of psm a

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A wrote: > I didn't go into the code to see who is actually calling this error message, > but I suspect this may be a generic error for "out of memory" kind of thing > and not specific to the que pair. To confirm please add -mca > pml_base_verbos

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias A wrote: > Hi Michael, > > I may be missing some context, if you are using the qlogic cards you will > always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl. > As Tom suggest, confirm the limits are setu

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote: > Hi Mike, > > In this file, > $ cat /etc/security/limits.conf > ... > < do you see at the end ... > > > * hard memlock unlimited > * soft memlock unlimited > # -- All InfiniBand Settings End here -- > ? Yes. I double checked that it's set on a

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Thu, Mar 10, 2016 at 11:54 AM, Michael Di Domenico wrote: > when i try to run an openmpi job with >128 ranks (16 ranks per node) > using alltoall or alltoallv, i'm getting an error that the process was > unable to get a queue pair. > > i've checked the max lock

[OMPI users] locked memory and queue pairs

2016-03-10 Thread Michael Di Domenico
when i try to run an openmpi job with >128 ranks (16 ranks per node) using alltoall or alltoallv, i'm getting an error that the process was unable to get a queue pair. i've checked the max locked memory settings across my machines; using ulimit -l in and outside of mpirun and they're all set to u

Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-09-28 Thread Schlottke-Lakemper, Michael
and that I am not able to track down. Sorry for having wasted your collective time on this; if this error should arise again, I will try to get a proper Valgrind report with -enable-debug and report it here. Michael > On 30 Jul 2015, at 22:10 , Nathan Hjelm wrote: > > > I agre

Re: [OMPI users] Oversubscription disabled by default in OpenMPI 1.8.7

2015-08-14 Thread Schlottke-Lakemper, Michael
Hi Ralph, Thanks a lot for the fast reply and the clarification. We’ve re-added the parameter to our MCA site configuration file. Michael On 14 Aug 2015, at 15:00 , Ralph Castain mailto:r...@open-mpi.org>> wrote: During the 1.7 series, we changed things at the request of system adm

[OMPI users] Oversubscription disabled by default in OpenMPI 1.8.7

2015-08-14 Thread Schlottke-Lakemper, Michael
r a feature? We recently upgraded from 1.6.x to 1.8.7, and as far as I remember, in 1.6.x oversubscription was enabled by default. Regards, Michael P.S.: In ompi_info, both rmaps_base_no_oversubscribe and rmaps_base_oversubscribe are reported as “false”. Our prefix/etc/openmpi-mca-params.conf file is empty.

Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-29 Thread Schlottke-Lakemper, Michael
If it is helpful, I can try to compile OpenMPI with debug information and get more details on the reported error. However, it would be good if someone could tell me the necessary compile flags (on top of -O0 -g) and it would take me probably 1-2 weeks to do it. Michael Original

Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-28 Thread Schlottke-Lakemper, Michael
Hi Ralph, That’s what I suspected. Thank you for your confirmation. Michael On 25 Jul 2015, at 16:10 , Ralph Castain mailto:r...@open-mpi.org>> wrote: Looks to me like a false positive - we do malloc some space, and do access different parts of it. However, it looks like we are insi

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
"io_ompio_delete_priority" (current value: "10", data source: default, level: 9 dev/all, type: int) So it seems we are indeed using ROMIO. Any suggestions what that means with respect to our file coherence issue? Regards, Michael On 23 Jul 2015, at 14:07 , Gilles Gouaillardet

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
Gilles (see other mail in thread) suggested, I am not sure whether we use romio or ompio, but I do not know how to find out. Michael

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
eproduce the issue with it. Sorry for not being more helpful, but we are also scratching our heads trying to understand what is going on and I just thought that maybe someone here has had a similar experience in the past (or might give us some pointers at what to look at). Regards, Michael

[OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
the job nodes using the -machinefile flag). Has anyone encountered something similar or do you have an idea what I could do to track down the problem? Regards, Michael -- Michael Schlottke-Lakemper SimLab Highly Scalable Fluids & Solids Engineering Jülich Aachen Research Alliance (JARA

[OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-23 Thread Schlottke-Lakemper, Michael
g has this error. Has anyone seen this or might be able to offer an explanation? If it is a false-positive, I’d be happy to suppress it :) Thanks a lot in advance Michael P.S.: This error is not covered/suppressed by the default ompi suppression file in $PREFIX/share/openmpi. -- Michael Schl

[OMPI users] slurm openmpi 1.8.3 core bindings

2015-01-30 Thread Michael Di Domenico
I'm trying to get slurm and openmpi to cooperate when running multi thread jobs. i'm sure i'm doing something wrong, but i can't figure out what my node configuration is 2 nodes 2 sockets 6 cores per socket i want to run sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2 prog

Re: [OMPI users] ipath_userinit errors

2014-11-06 Thread Michael Di Domenico
count gets over a certain point? thanks On Wed, Nov 5, 2014 at 5:51 PM, Friedley, Andrew wrote: > Hi Michael, > > From what I understand, this is an issue with the qib driver and PSM from > RHEL 6.5 and 6.6, and will be fixed for 6.7. There is no functional change > between qib

[OMPI users] ipath_userinit errors

2014-11-04 Thread Michael Di Domenico
I'm getting the below message on my cluster(s). It seems to only happen when I try to use more then 64 nodes (16-cores each). The clusters are running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM. I'm using the OFED versions included with RHEL for infiniband support. ipath_userinit: Mismatched

  1   2   3   4   >