Re: [OMPI users] OpenMPI 4.1.1, CentOS 7.9, nVidia HPC-SDk, build hints?

2021-09-30 Thread Raymond Muno via users
 Added -*-enable-mca-no-build=op-avx *to the configure line. Still dies in the same place. CCLD mca_op_avx.la ./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x0): multiple definition of `ompi_op_avx_functions_avx2' ./.libs/liblocal_ops_avx2.a(liblocal_ops_a

[OMPI users] OpenMPI 4.0.2 with PGI 19.10, will not build with hcoll

2020-01-24 Thread Raymond Muno via users
I am having issues building OpenMPI 4.0.2 using the PGI 19.10 compilers.  OS is CentOS 7.7, MLNX_OFED 4.7.3 It dies at: PGC/x86-64 Linux 19.10-0: compilation completed with warnings   CCLD mca_coll_hcoll.la pgcc-Error-Unknown switch: -pthread make[2]: *** [mca_coll_hcoll.la] Error 1 make[2]

Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
AMD, list the minimum supported kernel for EPYC/NAPLES as RHEL/Centos kernel 3.10-862, which is RHEL/CentOS 7.5 or later. Upgraded kernels can be used in 7.4. http://developer.amd.com/wp-content/resources/56420.pdf -Ray Muno On 1/8/20 7:37 PM, Raymond Muno wrote: We are running EPYC 7451 and

Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 6 was able to support these. We moved on to CentOS 7.6 at first and are now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier releases did not support x2APIC and could not handle 256 threads. Not and issue on

Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 6 was able to support these. We moved on to CentOS 7.6 at first and are now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier releases did not support x2APIC and could not handle 256 threads. Not and issue on

[OMPI users] Parameters at run time

2019-10-19 Thread Raymond Muno via users
Is there a way to determine, at run time, as to what choices OpenMPI made in terms of transports that are being utilized?  We want to verify we are running UCX over Infiniband. I have two users, executing the identical code, with the same mpirun options, getting vastly different execution time

Re: [OMPI users] UCX errors after upgrade

2019-10-02 Thread Raymond Muno via users
) wrote: Thanks Raymond; I have filed an issue for this on Github and tagged the relevant Mellanox people: https://github.com/open-mpi/ompi/issues/7009 On Sep 25, 2019, at 3:09 PM, Raymond Muno via users mailto:users@lists.open-mpi.org>> wrote: We are running against 4.0.2RC2 now. T

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
As a test, I rebooted a set of nodes. The user could run on 480 cores, on 5 nodes. We could not run beyond two nodes previous to that. We still get the VM_UNMAP warning, however. On 9/25/19 2:09 PM, Raymond Muno via users wrote: We are running against 4.0.2RC2 now. This is ussing current

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
od bug fixes in there since v4.0.1. On Sep 25, 2019, at 2:12 PM, Raymond Muno via users mailto:users@lists.open-mpi.org>> wrote: We are primarily using OpenMPI 3.1.4 but also have 4.0.1 installed. On our cluster, we were running CentOS 7.5 with updates, alongside MLNX_OFED 4.5.x.  

[OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
We are primarily using OpenMPI 3.1.4 but also have 4.0.1 installed. On our cluster, we were running CentOS 7.5 with updates, alongside MLNX_OFED 4.5.x.   OpenMPI was compiled with GCC, Intel, PGI and AOCC compilers. We could run with no issues. To accommodate updates needed to get our IB gear

Re: [OMPI users] Building OpenMPI with Lustre support using PGI fails

2018-11-27 Thread Raymond Muno
I apologize. I did not realize that I did not reply to the list. Going with the view this is a PGI problem,  I noticed they recently released version 18.10. I had just installed 18.7 within the last couple weeks. The problem is resolved in 18.10. -Ray Muno On 11/27/18 7:55 PM, Gilles Gouail

[OMPI users] Building OpenMPI with Lustre support using PGI fails

2018-11-13 Thread Raymond Muno
I am trying  to build OpenMPI with Lustre support using PGI 18.7 on CentOS 7.5 (1804). It builds successfully with Intel compilers, but fails to find the necessary  Lustre components with the PGI compiler. I have tried building  OpenMPI 4.0.0, 3.1.3 and 2.1.5.   I can build OpenMPI, but conf

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-21 Thread Raymond Muno
On 10/20/2010 8:30 PM, Scott Atchley wrote: We have fixed this bug in the most recent 1.4.x and 1.5.x releases. Scott OK, a few more tests. I was using PGI 10.4 as the compiler. I have now tried OpenMPI 1.4.3 with PGI 10.8 and Intel 11.1. I get the same results in each case, mpirun seg faul

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
On 10/20/2010 8:30 PM, Scott Atchley wrote Are you building OMPI with support for both MX and IB? If not and you only want MX support, try configuring OMPI using --disable-memory-manager (check configure for the exact option). We have fixed this bug in the most recent 1.4.x and 1.5.x releases

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
On 10/20/2010 8:30 PM, Scott Atchley wrote: On Oct 20, 2010, at 9:22 PM, Raymond Muno wrote: On 10/20/2010 7:59 PM, Ralph Castain wrote: The error message seems to imply that mpirun itself didn't segfault, but that something else did. Is that segfault pid from mpirun? This kind of pr

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
On 10/20/2010 7:59 PM, Ralph Castain wrote: The error message seems to imply that mpirun itself didn't segfault, but that something else did. Is that segfault pid from mpirun? This kind of problem usually is caused by mismatched builds - i.e., you compile against your new build, but you pick

[OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
We are doing a test build of a new cluster. We are re-using our Myrinet 10G gear from a previous cluster. I have built OpenMPI 1.4.2 with PGI 10.4. We use this regularly on our Infiniband based cluster and all the install elements were readily available. With a few go-arounds with the My

Re: [OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno
Raymond Muno wrote: Raymond Muno wrote: We are implementing a new cluster that is InfiniBand based. I am working on getting OpenMPI built for our various compile environments. So far it is working for PGI 7.2 and PathScale 3.1. I found some workarounds for issues with the Pathscale

Re: [OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno
Raymond Muno wrote: We are implementing a new cluster that is InfiniBand based. I am working on getting OpenMPI built for our various compile environments. So far it is working for PGI 7.2 and PathScale 3.1. I found some workarounds for issues with the Pathscale compilers (seg faults) in

[OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno
We are implementing a new cluster that is InfiniBand based. I am working on getting OpenMPI built for our various compile environments. So far it is working for PGI 7.2 and PathScale 3.1. I found some workarounds for issues with the Pathscale compilers (seg faults) in the OpenMPI FAQ. When