[OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-19 Thread Michael E. Thomadakis
MVAPICVHv2 and IntelMPI ? This is not a political issue since I am groing to be providing all these MPI stacks to our users. Thank you so much for the great s/w ... best Michael % \ % Michael E. Thomadakis, Ph.D. Senior Lead

Re: [OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-21 Thread Michael E. Thomadakis
Hello, I am resending this because I am not sure if it was sent out to the OMPI list. Any help would be greatly appreciated. best Michael On 05/19/10 13:19, Michael E. Thomadakis wrote: Hello, I would like to build OMPI V1.4.2 and make it available to our users at the Supercomputing

Re: [OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-26 Thread Michael E. Thomadakis
Hi Josh thanks for the reply. pls see below ... On 05/26/10 09:24, Josh Hursey wrote: (Sorry for the delay, I missed the C/R question in the mail) On May 25, 2010, at 9:35 AM, Jeff Squyres wrote: On May 24, 2010, at 2:02 PM, Michael E. Thomadakis wrote: | > 2) I have installed blcr V0.

Re: [OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-26 Thread Michael E. Thomadakis
v2.0, API v2.0, Component v1.4.2) does this mean we have the full affinity support included or do I need to involve HWLOC in any way ? On 05/25/10 08:35, Jeff Squyres wrote: On May 24, 2010, at 2:02 PM, Michael E. Thomadakis wrote: |> 1) high-resolution timers: how do I specify the H

[OMPI users] openmpi v1.5?

2010-07-19 Thread Michael Di Domenico
Since I am a SVN neophyte can anyone tell me when openmpi 1.5 is scheduled for release? And whether the Slurm srun changes are going to make in? thanks

[OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-23 Thread Schlottke-Lakemper, Michael
g has this error. Has anyone seen this or might be able to offer an explanation? If it is a false-positive, I’d be happy to suppress it :) Thanks a lot in advance Michael P.S.: This error is not covered/suppressed by the default ompi suppression file in $PREFIX/share/openmpi. -- Michael Schl

[OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
the job nodes using the -machinefile flag). Has anyone encountered something similar or do you have an idea what I could do to track down the problem? Regards, Michael -- Michael Schlottke-Lakemper SimLab Highly Scalable Fluids & Solids Engineering Jülich Aachen Research Alliance (JARA

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
eproduce the issue with it. Sorry for not being more helpful, but we are also scratching our heads trying to understand what is going on and I just thought that maybe someone here has had a similar experience in the past (or might give us some pointers at what to look at). Regards, Michael

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
Gilles (see other mail in thread) suggested, I am not sure whether we use romio or ompio, but I do not know how to find out. Michael

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
"io_ompio_delete_priority" (current value: "10", data source: default, level: 9 dev/all, type: int) So it seems we are indeed using ROMIO. Any suggestions what that means with respect to our file coherence issue? Regards, Michael On 23 Jul 2015, at 14:07 , Gilles Gouaillardet

Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-28 Thread Schlottke-Lakemper, Michael
Hi Ralph, That’s what I suspected. Thank you for your confirmation. Michael On 25 Jul 2015, at 16:10 , Ralph Castain mailto:r...@open-mpi.org>> wrote: Looks to me like a false positive - we do malloc some space, and do access different parts of it. However, it looks like we are insi

Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-29 Thread Schlottke-Lakemper, Michael
If it is helpful, I can try to compile OpenMPI with debug information and get more details on the reported error. However, it would be good if someone could tell me the necessary compile flags (on top of -O0 -g) and it would take me probably 1-2 weeks to do it. Michael Original

[OMPI users] Oversubscription disabled by default in OpenMPI 1.8.7

2015-08-14 Thread Schlottke-Lakemper, Michael
r a feature? We recently upgraded from 1.6.x to 1.8.7, and as far as I remember, in 1.6.x oversubscription was enabled by default. Regards, Michael P.S.: In ompi_info, both rmaps_base_no_oversubscribe and rmaps_base_oversubscribe are reported as “false”. Our prefix/etc/openmpi-mca-params.conf file is empty.

Re: [OMPI users] Oversubscription disabled by default in OpenMPI 1.8.7

2015-08-14 Thread Schlottke-Lakemper, Michael
Hi Ralph, Thanks a lot for the fast reply and the clarification. We’ve re-added the parameter to our MCA site configuration file. Michael On 14 Aug 2015, at 15:00 , Ralph Castain mailto:r...@open-mpi.org>> wrote: During the 1.7 series, we changed things at the request of system adm

Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-09-28 Thread Schlottke-Lakemper, Michael
and that I am not able to track down. Sorry for having wasted your collective time on this; if this error should arise again, I will try to get a proper Valgrind report with -enable-debug and report it here. Michael > On 30 Jul 2015, at 22:10 , Nathan Hjelm wrote: > > > I agre

[OMPI users] locked memory and queue pairs

2016-03-10 Thread Michael Di Domenico
when i try to run an openmpi job with >128 ranks (16 ranks per node) using alltoall or alltoallv, i'm getting an error that the process was unable to get a queue pair. i've checked the max locked memory settings across my machines; using ulimit -l in and outside of mpirun and they're all set to u

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Thu, Mar 10, 2016 at 11:54 AM, Michael Di Domenico wrote: > when i try to run an openmpi job with >128 ranks (16 ranks per node) > using alltoall or alltoallv, i'm getting an error that the process was > unable to get a queue pair. > > i've checked the max lock

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote: > Hi Mike, > > In this file, > $ cat /etc/security/limits.conf > ... > < do you see at the end ... > > > * hard memlock unlimited > * soft memlock unlimited > # -- All InfiniBand Settings End here -- > ? Yes. I double checked that it's set on a

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias A wrote: > Hi Michael, > > I may be missing some context, if you are using the qlogic cards you will > always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl. > As Tom suggest, confirm the limits are setu

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A wrote: > I didn't go into the code to see who is actually calling this error message, > but I suspect this may be a generic error for "out of memory" kind of thing > and not specific to the que pair. To confirm please add -mca > pml_base_verbos

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" i see cm best priority 20, which seems to relate to ob1 being selected. i don't see a mention of psm a

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres) wrote: > Can you send all the information listed here? > > https://www.open-mpi.org/community/help/ > > (including the full output from the run with the PML/BTL/MTL/etc. verbosity) > > This will allow Matias to look through all the rele

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" this may have turned up more then i expected. i recompiled openmpi v1.8.4 as a test and reran the test

[OMPI users] alltoallv

2017-10-10 Thread Michael Di Domenico
i'm getting stuck trying to run some fairly large IMB-MPI alltoall tests under openmpi 2.0.2 on rhel 7.4 i have two different clusters, one running mellanox fdr10 and one running qlogic qdr if i issue mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv the job just stalls after t

[OMPI users] openmpi mgmt traffic

2017-10-11 Thread Michael Di Domenico
my cluster nodes are connected on 1g ethernet eth0/eth1 and via infiniband rdma and ib0 my understanding is that openmpi will detect all these interfaces. using eth0/eth1 for connection setup and use rdma for msg passing what would be an appropriate to command line parameters to tell openmpi to i

[OMPI users] openmpi hang on IB disconnect

2018-01-17 Thread Michael Di Domenico
openmpi-2.0.2 running on rhel 7.4 with qlogic QDR infiniband switches/adapters, also using slurm i have a user that's running a job over multiple days. unfortunately after a few days at random the job will seemingly hang. the latest instance was caused by an infiniband adapter that went offline

[OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
following qualifiers in my OMPI command to no avail: --mca btl ^tcp,self,sm So the question is, am I able to disable TCP networking, either via command line or code, if I only plan to use cores on a single machine for OMPI execution? Many Thanks, Mike... -- Michael A.Saverino Contractor

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
s other than > shared memory - note that you always must enable the “self” btl. > > Second, you likely also need to ensure that the OOB isn’t trying to use tcp, > so add “-mca oob ^tcp” to your cmd line. It shouldn’t be active anyway, but > better safe. > > >> On Feb 26

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
answer Windows firewall questions (if enabled) permitting/not permitting orterun and my application.  Do you have the Microsoft Loopback adapter installed on your system? Many Thanks, Mike... On 02/26/2018 02:11 PM, Marco Atzeri wrote: > On 26/02/2018 18:14, Michael A. Saverino wrote: >>

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
Marco, If you disable the loopback as well as the other adapters via Device Manager, you should be able to reproduce the error. Mike... On 02/26/2018 04:51 PM, Marco Atzeri wrote: > On 26/02/2018 22:10, Michael A. Saverino wrote: >> >> Marco, >> >> I think oob still

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread Michael A. Saverino
OK, Thanks for your help. Mike... On 02/26/2018 05:07 PM, Marco Atzeri wrote: > On 26/02/2018 22:57, Michael A. Saverino wrote: >> >> Marco, >> >> If you disable the loopback as well as the other adapters via Device >> Manager, you should be able to reproduc

[OMPI users] disabling libraries?

2018-04-05 Thread Michael Di Domenico
i'm trying to compile openmpi to support all of our interconnects, psm/openib/mxm/etc this works fine, openmpi finds all the libs, compiles and runs on each of the respective machines however, we don't install the libraries for everything everywhere so when i run things like ompi_info and mpirun

Re: [OMPI users] disabling libraries?

2018-04-06 Thread Michael Di Domenico
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet wrote: > That being said, the error suggest mca_oob_ud.so is a module from a > previous install, > Open MPI was not built on the system it is running, or libibverbs.so.1 > has been removed after > Open MPI was built. yes, understood, i compiled

Re: [OMPI users] disabling libraries?

2018-04-10 Thread Michael Di Domenico
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres) wrote: > On Apr 6, 2018, at 8:12 AM, Michael Di Domenico > wrote: >> it would be nice if openmpi had (or may already have) a simple switch >> that lets me disable entire portions of the library chain, ie this >

[OMPI users] openmpi/slurm/pmix

2018-04-23 Thread Michael Di Domenico
i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix. everything compiled, but when i run something it get : symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol: opal_libevent2022_evthread_use_pthreads i more then sure i did something wrong, but i'm not sure what, h

Re: [OMPI users] openmpi/slurm/pmix

2018-04-25 Thread Michael Di Domenico
On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.org wrote: > Looks like the problem is that you didn’t wind up with the external PMIx. The > component listed in your error is the internal PMIx one which shouldn’t have > built given that configure line. > > Check your config.out and see what happe

[OMPI users] shmem

2018-05-09 Thread Michael Di Domenico
before i debug ucx further (cause it's totally not working for me), i figured i'd check to see if it's *really* required to use shmem inside of openmpi. i'm pretty sure the answer is yes, but i wanted to double check. ___ users mailing list users@lists.o

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-14 Thread Michael Di Domenico
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote: > > You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a > switch), and install that > on your system, or else install xpmem (https://github.com/hjelmn/xpmem). > Note there is a bug right now > in UCX that you may hit if

Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Michael Di Domenico
On Mon, Nov 12, 2018 at 8:08 AM Andrei Berceanu wrote: > > Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the following > warnings: > > -- > WARNING: There is at least non-excluded one OpenFabrics device foun

[OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
i compilied pmix slurm openmpi ---pmix ./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13 --disable-debug ---slurm ./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13 --with-pmix=/hpc/pmix/2.2 ---openmpi ./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external --wit

Re: [OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
adding > > PMIX_MCA_pmix_client_event_verbose=5 > PMIX_MCA_pmix_server_event_verbose=5 > OMPI_MCA_pmix_base_verbose=10 > > to your environment and see if that provides anything useful. > > > On Jan 18, 2019, at 12:09 PM, Michael Di Domenico > > wrote: > > > > i compilie

Re: [OMPI users] pmix and srun

2019-01-18 Thread Michael Di Domenico
s a typo in the v2.2.1 release. Sadly, our Slurm > > plugin folks seem to be off somewhere for awhile and haven’t been testing > > it. Sigh. > > > > I’ll patch the branch and let you know - we’d appreciate the feedback. > > Ralph > > > > > >> On

[OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
i have a user that's claiming when two ranks on the same node want to talk with each other, they're using the NIC to talk rather then just talking directly. i've never had to test such a scenario. is there a way for me to prove one way or another whether two ranks are talking through say the kern

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 11:51 AM Ralph H Castain wrote: > You are probably using the ofi mtl - could be psm2 uses loopback method? according to ompi_info i do in fact have mtl's ofi,psm,psm2. i haven't changed any of the defaults, so are you saying order to change the behaviour i have to run mpi

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 12:19 PM Ralph H Castain wrote: > OFI uses libpsm2 underneath it when omnipath detected > > > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet > > wrote: > > It might show that pml/cm and mtl/psm2 are used. In that case, then yes, > > the OmniPath library is used even fo

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet wrote: > You can force > mpirun --mca pml ob1 ... > And btl/vader (shared memory) will be used for intra node communications ... > unless MPI tasks are from different jobs (read MPI_Comm_spawn()) if i run mpirun -n 16 IMB-MPI1 alltoallv thing

Re: [OMPI users] local rank to rank comms

2019-03-20 Thread Michael Di Domenico
unfortunately it takes a while to export the data, but here's what i see On Mon, Mar 11, 2019 at 11:02 PM Gilles Gouaillardet wrote: > > Michael, > > > this is odd, I will have a look. > > Can you confirm you are running on a single node ? > > > At first, you

[OMPI users] configure openmpi with support for Sun gridengine (SGE)?

2006-12-19 Thread Michael John Hanby
Howdy, I'm compiling OpenMPI 1.1.2 on a Rocks Cluster 4.2.1. The cluster has sge installed which is what the users will use to submit their MPI jobs (i.e. using qsub). I'm not having any luck finding the correct parameters to provide ./configure in order to include support for SGE. Here's the co

Re: [OMPI users] configure openmpi with support for Sun gridengine(SGE)?

2006-12-19 Thread Michael John Hanby
Never mind, I found it in the FAQ, need version 1.2 of OpenMPI. I'll give that a go. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Michael John Hanby Sent: Tuesday, December 19, 2006 2:28 PM To: Open MPI Users Subject: [OMPI

[OMPI users] Infiniband - Any suggestions on "How can you prove to me that OpenMPI is using it?"

2006-12-20 Thread Michael John Hanby
Howdy, I'm new to cluster administration, MPI and high speed networks. I've compiled my OpenMPI using these settings: ./configure CC='icc' CXX='icpc' FC='ifort' F77='ifort' --with-mvapi=/usr/local/topspin --with-mvapi-libdir=/usr/local/topspin/lib64 --enable-static --prefix=/share/apps/openmpi/1.

Re: [OMPI users] Infiniband - Any suggestions on "How can you prove to me that OpenMPI is using it?"

2006-12-21 Thread Michael John Hanby
Thanks Jeff and Andrew, those responses were well thought out and very informative. I'll run the same test explicitly using TCP for a couple runs and then mvapi a couple times. The results should give the users a warm fuzzy that OpenMPI is in fact using the expensive faster network (rather than c

[OMPI users] Differences 4.0.3 -> 4.0.4 (Regression?)

2020-08-06 Thread Michael Fuckner via users
Hi, I have a small setup with one headnode and two compute nodes connected via IB-QDR running CentOS 8.2 and Mellanox OFED 4.9 LTS. I installed openmpi 3.0.6, 3.1.6, 4.0.3 and 4.0.4 with identical configuration (configure, compile, nothing configured in openmpi-mca-params.conf), the output fr

Re: [OMPI users] Differences 4.0.3 -> 4.0.4 (Regression?)

2020-08-08 Thread Michael Fuckner via users
slurm support there is no need to # specify the number of processes or a hostfile to mpirun. /opt/openmpi/${OPENMPI}/gcc/bin/mpirun ${BIND_OPT} --mca pmix_base_verbose 100 --debug-daemons ./OWnetbench/OWnetbench.openmpi-${OPENMPI} done On 08/08/2020 18:46, Howard Pritchard wrote: Hello Mic

Re: [OMPI users] Differences 4.0.3 -> 4.0.4 (Regression?)

2020-08-10 Thread Michael Fuckner via users
Hi, just tried 4.0.5rc1 and this is working as 4.0.3 (directly and via slurm). So it is just 4.0.4 not working. Diffed Config and build.sh, but couldn't find anything. I don't know why, but I'll accept it... Regards, Michael! On 08/08/2020 18:46, Howard Pritchard wrote:

Re: [OMPI users] Newbie With Issues

2021-03-30 Thread Michael Fuckner via users
/intel/oneapi/compiler/2021.2.0/linux/bin Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/10 Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/10 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64 Regards, Michael! > bend linux4ms.net

Re: [OMPI users] Datatype construction, serious limitation (was: Signal: Segmentation fault (11) Problem)

2007-04-19 Thread Michael Gauckler (mailing lists)
n and that you are willing to fix the issue in upcoming releases of Open-MPI. If there is anything else I can help with, please let me know. Regards, Michael Gauckler [1] http://lists.boost.org/Archives/boost/2007/01/115347.php [2] http://lists.boost.org/boost-announce/2006/09/0099.php [3]

[OMPI users] Quality and details of implementation for Neighborhood collective operations

2022-06-08 Thread Michael Thomadakis via users
ot; way to provide optimized neighborhood collectives? Thanks you much Michael

Re: [OMPI users] Quality and details of implementation for Neighborhood collective operations

2022-06-08 Thread Michael Thomadakis via users
I see, thanks Is there any plan to apply any optimizations on the Neighbor collectives at some point? regards Michael On Wed, Jun 8, 2022 at 1:29 PM George Bosilca wrote: > Michael, > > As far as I know none of the implementations of the > neighborhood collectives in OMPI are

Re: [OMPI users] silent failure for large allgather

2019-09-25 Thread Heinz, Michael William via users
Emmanuel Thomé, Thanks for bringing this to our attention. It turns out this issue affects all OFI providers in open-mpi. We've applied a fix to the 3.0.x and later branches of open-mpi/ompi on github. However, you should be aware that this fix simply adds the appropriate error message, it does

[OMPI users] Subject: need a tool and its use to verify use of infiniband network

2020-01-16 Thread Heinz, Michael William via users
btl_base_verbose may do what you need. Add it to your mpirun arguments. For example: [LINUX hds1fna2271 20200116_1404 mpi_apps]# /usr/mpi/gcc/openmpi-3.1.6/bin/mpirun -np 2 -map-by node --allow-run-as-root -machinefile /usr/src/opa/mpi_apps/mpi_hosts -mca btl self,openib,vader -mca btl_base_ve

[OMPI users] openmpi/pmix/ucx

2020-02-07 Thread Michael Di Domenico via users
i haven't compiled openmpi in a while, but i'm in the process of upgrading our cluster. the last time i did this there were specific versions of mpi/pmix/ucx that were all tested and supposed to work together. my understanding of this was because pmi/ucx was under rapid development and the api's

Re: [OMPI users] openmpi/pmix/ucx

2020-02-07 Thread Michael Di Domenico via users
d to be what Mellanox used to configure OpenMPI in HPC-X > 2.5. > > I have users using GCC, PGI, Intel and AOCC compilers with this config. PGI > was the only one that > was a challenge to build due to conflicts with HCOLL. > > -Ray Muno > > On 2/7/20 10:04 AM, Michael Di

[OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Heinz, Michael William via users
Prentice, Avoiding the obvious question of whether your FM is running and the fabric is in an active state, It sounds like your exhausting a resource on the cards. Ralph is correct about support for QLogic cards being long past but I’ll see what I can dig up in the archives on Monday to see if

Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Heinz, Michael William via users
That it! I was trying to remember what the setting was but I haven’t worked on those HCAs since around 2012, so it was faint. That said, I found the Intel TrueScale manual online at https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/OFED_Host_Software_UserG

[OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
Patrick, You really have to provide us some detailed information if you want assistance. At a minimum we need to know if you're using the PSM2 MTL or the OFI MTL and what the actual error is. Please provide the actual command line you are having problems with, along with any errors. In additio

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
What happens if you specify -mtl ofi ? -Original Message- From: users On Behalf Of Patrick Begou via users Sent: Monday, January 25, 2021 12:54 PM To: users@lists.open-mpi.org Cc: Patrick Begou Subject: Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path Hi Howard and Michael, thanks

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Heinz, Michael William via users
Patrick, is your application multi-threaded? PSM2 was not originally designed for multiple threads per process. I do know that the OSU alltoallV test does pass when I try it. Sent from my iPad > On Jan 25, 2021, at 12:57 PM, Patrick Begou via users > wrote: > > Hi Howard

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Heinz, Michael William via users
Patrick how are you using original PSM if you’re using Omni-Path hardware? The original PSM was written for QLogic DDR and QDR Infiniband adapters. As far as needing openib - the issue is that the PSM2 MTL doesn’t support a subset of MPI operations that we previously used the pt2pt BTL for. For

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Michael Di Domenico via users
2021 at 3:44 PM Patrick Begou via users wrote: > > Hi Michael > > indeed I'm a little bit lost with all these parameters in OpenMPI, mainly > because for years it works just fine out of the box in all my deployments on > various architectures, interconnects and linux flavor. S

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Michael Di Domenico via users
tible with PSM and OPA when running specifically on debian (likely due to library versioning). i don't know how common that is, so it's not clear how flushed out and tested it is On Wed, Jan 27, 2021 at 3:07 PM Patrick Begou via users wrote: > > Hi Howard and Michael > > first man

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Heinz, Michael William via users
Unfortunately, OPA/PSM support for Debian isn't handled by Intel directly or by Cornelis Networks - but I should point out you can download the latest official source for PSM2 and the drivers from Github. -Original Message- From: users On Behalf Of Michael Di Domenico via users

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Heinz, Michael William via users
Patrick, Do you have any PSM2_* or HFI_* environment variables defined in your run time environment that could be affecting things? -Original Message- From: users On Behalf Of Heinz, Michael William via users Sent: Wednesday, January 27, 2021 3:37 PM To: Open MPI Users Cc: Heinz

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-28 Thread Heinz, Michael William via users
Patrick, A few more questions for you: 1. What version of IFS are you running? 2. Are you using CUDA cards by any chance? If so, what version of CUDA? -Original Message- From: Heinz, Michael William Sent: Wednesday, January 27, 2021 3:45 PM To: Open MPI Users Subject: RE: [OMPI users

[OMPI users] Unexpected issue with 4.1.x build

2021-03-02 Thread Heinz, Michael William via users
this might be happening? I do not see this with OMPI 4.0.3. --- Michael Heinz Fabric Software Engineer, Cornelis Networks

Re: [OMPI users] Stable and performant openMPI version for Ubuntu20.04 ?

2021-03-04 Thread Heinz, Michael William via users
What interconnect are you using at run time? That is, are you using Ethernet or InfiniBand or Omnipath? Sent from my iPad On Mar 4, 2021, at 5:05 AM, Raut, S Biplab via users wrote:  [AMD Official Use Only - Internal Distribution Only] After downloading a particular openMPI version, let’s

Re: [OMPI users] Error intialising an OpenFabrics device.

2021-03-13 Thread Heinz, Michael William via users
I’ve begun getting this annoyingly generic warning, too. It appears to be coming from the openib provider. If you disable it with -mtl ^openib the warning goes away. Sent from my iPad > On Mar 13, 2021, at 3:28 PM, Bob Beattie via users > wrote: > > Hi everyone, > > To be honest, as an MPI

[OMPI users] building openshem on opa

2021-03-22 Thread Michael Di Domenico via users
i can build and run openmpi on an opa network just fine, but it turns out building openshmem fails. the message is (no spml) found looking at the config log it looks like it tries to build spml ikrit and ucx which fail. i turn ucx off because it doesn't support opa and isn't needed. so this mes

Re: [OMPI users] [EXTERNAL] building openshem on opa

2021-03-22 Thread Michael Di Domenico via users
On Mon, Mar 22, 2021 at 11:13 AM Pritchard Jr., Howard wrote: > https://github.com/Sandia-OpenSHMEM/SOS > if you want to use OpenSHMEM over OPA. > If you have lots of cycles for development work, you could write an OFI SPML > for the OSHMEM component of Open MPI. thanks, i am aware of the sandi

Re: [OMPI users] Newbie With Issues

2021-03-30 Thread Heinz, Michael William via users
It looks like you're trying to build Open MPI with the Intel C compiler. TBH - I think that icc isn't included with the latest release of oneAPI, I think they've switched to including clang instead. I had a similar issue to yours but I resolved it by installing a 2020 version of the Intel HPC so

[OMPI users] Building Open-MPI with Intel C

2021-04-06 Thread Heinz, Michael William via users
rs_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x7fdaa23e1000) /lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000) Can anyone suggest what I'm forgetting to do? --- Michael Heinz Fabric Software Engineer, Cornelis Networks

Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Heinz, Michael William via users
Giles, I’ll double check - but the intel runtime is installed on all machines in the fabric. - Michael Heinz michael.william.he...@cornelisnetworks.com<mailto:michael.william.he...@cornelisnetworks.com> On Apr 7, 2021, at 2:42 AM, Gilles Gouaillardet via users mailto:users@list

Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Heinz, Michael William via users
ds... By the way, have you looked at using Easybuild? Would be good to have your input there maybe. On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users mailto:users@lists.open-mpi.org>> wrote: I’m having a heck of a time building OMPI with Intel C. Compilation goes fine, ins

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-10 Thread Heinz, Michael William via users
That warning is an annoying bit of cruft from the openib / verbs provider that can be ignored. (Actually, I recommend using "-btl ^openib" to suppress the warning.) That said, there is a known issue with selecting PSM2 and OMPI 4.1.0. I'm not sure that that's the problem you're hitting, though,

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
nta Fe, ARGENTINA. Tel +54-342-4511594/95 ext 7062, fax: +54-342-4511169 What am I missing and how can I improve the performance? Regards, Pavel Mezentsev. On Mon, May 10, 2021 at 6:20 PM Heinz, Michael William < michael.william.he...@cornelisnetworks.com<mailto:michael.william.he.

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
do it. However, note that the format of the string must be 16 hex digits, a hyphen, then 16 more hex digits. anything else will be rejected. Also, I have never tried doing this, YMMV. From: Heinz, Michael William Sent: Wednesday, May 19, 2021 10:35 AM To: Open MPI Users Cc: Ralph Castain Subj

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
Wednesday, May 19, 2021 11:31 AM To: Open MPI Users Cc: Heinz, Michael William Subject: Re: [OMPI users] unable to launch a job on a system with OmniPath Just some more data from my OminPath based cluster. There certainly was a change from 4.0.x to 4.1.x With 4.0.1 I woud build openmpi with .

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-27 Thread Heinz, Michael William via users
with each other? Another idea that came to mind was to get an OpenMPI build that would not have any high performance fabric support and would only work via TCP. So any advice on how to accomplish my goal would be appreciated. I realize that performance-wise that is going to be quite... sad. But

[OMPI users] strange pml error

2021-11-02 Thread Michael Di Domenico via users
fairly frequently, but not everytime when trying to run xhpl on a new machine i'm bumping into this. it happens with a single node or multiple nodes node1 selected pml ob1, but peer on node1 selected pml ucx if i rerun the exact same command a few minutes later, it works fine. the machine is new

Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-03 Thread Michael Di Domenico via users
perhaps there is > different initialization that happens such that the offending device search > problem doesn't occur? > > > Thanks, > > David > > > > > From: Shrader, David Lee > Sent: Tuesday, November 2, 2021 2:09 P

<    1   2   3   4