[OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
 We are doing a test build of a new cluster. We are re-using our 
Myrinet 10G gear from a previous cluster.


I have built OpenMPI 1.4.2  with PGI 10.4.   We use this regularly on 
our Infiniband based cluster and all the install elements were readily 
available.


With a few go-arounds with the Myrinet MX stack, we are now running MX 
-1.2.12 with allowances for more than the max of 16 endpoints. Each node 
has 24 cores.


The cluster is running rocks 5.3.

As part of the initial build, I installed the Myrinet_MX Rocks Roll from 
Myricom. With the default limitation of 16 endpoints, we could not run 
on all nodes. As mentioned above, the MX stack was replaced.


Myrinet provided a build of OpenMPI 1.4.1.That build works. It is 
only compiled with gcc and gfortran and we wanted it built with the 
compilers we normally use, e.g. PGI, Pathscale and Intel.


We can compile with the  OpenMPI 1.4.2 / PGI 10.4 build. However, we 
cannot launch jobs with mpirun. It seg faults.


--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
[enet1-head2-eth1:29532] *** Process received signal ***
[enet1-head2-eth1:29532] Signal: Segmentation fault (11)
[enet1-head2-eth1:29532] Signal code: Address not mapped (1)
[enet1-head2-eth1:29532] Failing at address: 0x6c
[enet1-head2-eth1:29532] *** End of error message ***
Segmentation fault

However, if we launch the job with the Myricom supplied mpirun in the 
OpenMPI tree, the job runs successfully. This works even with a test 
program compiled with the OpenMPI 1.4.2  with PGI 10.4 build.





Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno

 On 10/20/2010 7:59 PM, Ralph Castain wrote:

The error message seems to imply that mpirun itself didn't segfault, but that 
something else did. Is that segfault pid from mpirun?

This kind of problem usually is caused by mismatched builds - i.e., you compile 
against your new build, but you pick up the Myrinet build when you try to run 
because of path and ld_library_path issues. You might check to ensure you are 
running against what you built with.
The PATH and LD_LIBRARY_PATH are set explicitly (through modules) on the 
frontend and each node.  The PGI compiler and the OpenMPI I am trying to 
run is set for each.


ldd /share/apps/opt/OpenMPI/1.4.2/PGI/10.4/bin/mpirun

libopen-rte.so.0 => 
/share/apps/opt/OpenMPI/1.4.2/PGI/10.4/lib/libopen-rte.so.0 
(0x2b6a16552000)
libopen-pal.so.0 => 
/share/apps/opt/OpenMPI/1.4.2/PGI/10.4/lib/libopen-pal.so.0 
(0x2b6a167aa000)

libdl.so.2 => /lib64/libdl.so.2 (0x003a7dc0)
libnsl.so.1 => /lib64/libnsl.so.1 (0x003a8040)
libutil.so.1 => /lib64/libutil.so.1 (0x003a88a0)
libpthread.so.0 => /lib64/libpthread.so.0 (0x003a7e00)
libm.so.6 => /lib64/libm.so.6 (0x003a7d80)
libc.so.6 => /lib64/libc.so.6 (0x003a7d40)
libpgc.so => 
/share/apps/opt/PGI/10.4/linux86-64/10.4/libso/libpgc.so 
(0x2b6a16a28000)

/lib64/ld-linux-x86-64.so.2 (0x003a7d00)

The one that works from the other tree

ldd /opt/openmpi-myrinet_mx/bin/mpirun
libopen-rte.so.0 => 
/opt/openmpi-myrinet_mx/lib/libopen-rte.so.0 (0x2b51c71b)
libopen-pal.so.0 => 
/opt/openmpi-myrinet_mx/lib/libopen-pal.so.0 (0x2b51c743)

libdl.so.2 => /lib64/libdl.so.2 (0x003a7dc0)
libnsl.so.1 => /lib64/libnsl.so.1 (0x003a8040)
libutil.so.1 => /lib64/libutil.so.1 (0x003a88a0)
libm.so.6 => /lib64/libm.so.6 (0x003a7d80)
libpthread.so.0 => /lib64/libpthread.so.0 (0x003a7e00)
libc.so.6 => /lib64/libc.so.6 (0x003a7d40)
/lib64/ld-linux-x86-64.so.2 (0x003a7d00)



Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno

 On 10/20/2010 8:30 PM, Scott Atchley wrote:

On Oct 20, 2010, at 9:22 PM, Raymond Muno wrote:


On 10/20/2010 7:59 PM, Ralph Castain wrote:

The error message seems to imply that mpirun itself didn't segfault, but that 
something else did. Is that segfault pid from mpirun?

This kind of problem usually is caused by mismatched builds - i.e., you compile 
against your new build, but you pick up the Myrinet build when you try to run 
because of path and ld_library_path issues. You might check to ensure you are 
running against what you built with.

The PATH and LD_LIBRARY_PATH are set explicitly (through modules) on the 
frontend and each node.  The PGI compiler and the OpenMPI I am trying to run is 
set for each.


Are you building OMPI with support for both MX and IB? If not and you only want 
MX support, try configuring OMPI using --disable-memory-manager (check 
configure for the exact option).

We have fixed this bug in the most recent 1.4.x and 1.5.x releases.

Scott

I just downloaded 1.4.3 and compiled it with PGI 10.4.   I get the same 
result.


I did confirm that the process ID shown is that of mpirun.

This cluster only has Myrinet.  The install is separate from the IB 
cluster and a fresh build.   I will try the configure option.


Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno

 On 10/20/2010 8:30 PM, Scott Atchley wrote

Are you building OMPI with support for both MX and IB? If not and you only want 
MX support, try configuring OMPI using --disable-memory-manager (check 
configure for the exact option).

We have fixed this bug in the most recent 1.4.x and 1.5.x releases.

Scott


Hmmm, not sure which configure option you want me to try.

$ ./configure --help | grep memory
  --enable-mem-debug  enable memory debugging (debugging only) 
(default:
  --enable-mem-profileenable memory profiling (debugging only) 
(default:
  --enable-memchecker Enable memory and buffer checks. Note that 
disabling

  --with-memory-manager=TYPE
  Use TYPE for intercepting memory management 
calls to

  control memory pinning.

]$ ./configure --help | grep disable
  --cache-file=FILE   cache test results in FILE [disabled]
  --disable-option-checking  ignore unrecognized --enable/--with options
  --disable-FEATURE   do not include FEATURE (same as 
--enable-FEATURE=no)

  disabled)
  disabled)
  building Open MPI (default: disabled)
  general MPI users!) (default: disabled)
  --disable-debug-symbols Disable adding compiler flags to enable debugging
  --enable-peruse Support PERUSE interface (default: disabled)
  --enable-pty-supportEnable/disable PTY support for STDIO forwarding.
  dlopen implies --disable-mca-dso. (default: 
enabled)

  support (default: disabled)
  MPI applications (default: disabled)
  This option ignores the --disable-binaries option
  --disable-ipv6  Disable IPv6 support (default: enabled, but 
only if

  --disable-dependency-tracking  speeds up one-time build
  disabled)
  --enable-smp-locks  enable smp locks in atomic ops. Do not disable if
  disabled)
  --disable-ft-thread Disable fault tolerance thread running inside all
  disable building all maffinity components and the
  as static disables it building as a DSO.
  --enable-mca-no-build list (default: disabled)
  --disable-executables   Using --disable-executables disables building and
  --disable-included-mode, meaning that the 
PLPA is in

  InfiniBand ConnectX adapters, you may disable the
  (default: disabled)
  --disable-mpi-ioDisable built-in support for MPI-2 I/O, likely
  --disable-io-romio  Disable the ROMIO MPI-IO component
  "--enable-contrib-no-build=libtrace,vt" will 
disable

  --disable-libtool-lock  avoid locking (might break parallel builds)
  (default: disabled).
  (default: disabled)



Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-21 Thread Raymond Muno

On 10/20/2010 8:30 PM, Scott Atchley wrote:

We have fixed this bug in the most recent 1.4.x and 1.5.x releases.

Scott

OK, a few more tests.  I was using PGI 10.4 as the compiler.

I have now tried OpenMPI 1.4.3 with PGI 10.8 and Intel 11.1.  I get the 
same results in each case, mpirun seg faults. (I really did not expect 
that to change anything).


I tried OpenMPI 1.5.  Under PGI, I could not get it to compile.   With 
Intel 11.1, it compiles. When I try to run a simple test, mpirun just 
seems to hang and I never see anything start on the nodes.  I would 
rather stick with 1.4.x for now since that is what we are running on our 
other production cluster.  I will leave this for a later day.


I grabbed the 1.4.3 version from this page.

http://www.open-mpi.org/software/ompi/v1.4/

When you say this bug is fixed in recent  1.4.x releases,  should I try 
one from here?


http://www.open-mpi.org/nightly/v1.4/

For grins, I compiled the OpenMPI 1.4.1 tree.  This what Myricom 
supplied with the MX roll. Same result.  I can still run with their 
compiled version of mpirun, even when I compile with the other build 
trees and compilers.  I just do not know what options they compiled with.


Any insight would be appreciated.

-Ray Muno
 University of Minnesota


[OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno
We are implementing a new cluster that is InfiniBand based.  I am 
working on getting OpenMPI built for our various compile environments. 
So far it is working for PGI 7.2 and PathScale 3.1.  I found some 
workarounds for issues with the Pathscale compilers (seg faults) in the 
OpenMPI FAQ.


When I try to build with SunStudio, I cannot even get past the configure 
stage. It dies in th estage that checks for C++.


*** C++ compiler and preprocessor
checking whether we are using the GNU C++ compiler... no
checking whether CC accepts -g... yes
checking dependency style of CC... none
checking how to run the C++ preprocessor... CC -E
checking for the C++ compiler vendor... sun
checking if C++ compiler works... no
**
* It appears that your C++ compiler is unable to produce working
* executables.  A simple test application failed to properly
* execute.  Note that this is likely not a problem with Open MPI,
* but a problem with the local compiler installation.  More
* information (including exactly what command was given to the
* compiler and what error resulted when the command was executed) is
* available in the config.log file in this directory.
**
configure: error: Could not run a simple C++ program.  Aborting.

The section in config.log looks to be


configure:21722: CC -c -DNDEBUG   conftest.cpp >&5
configure:21728: $? = 0
configure:21907: result: sun
configure:21929: checking if C++ compiler works
configure:22006: CC -o conftest -DNDEBUGconftest.cpp  >&5
/usr/lib64/libm.so: file not recognized: File format not recognized
configure:22009: $? = 1
configure: program exited with status 1
configure: failed program was:
=

The attempt to configure was done with.

./configure CC=cc CXX=CC F77=f77 FC=f90 --prefix=path_to_install

All the SunStudio binaries are at the front of the path. 


I find this entry in the FAQthe SunStudio compilers

http://www.open-mpi.org/faq/?category=building#build-sun-compilers

and followed that as well, with no success.  It still dies at the 
configure step. 

The SunStudio version is 12. The target (and compilation) platform is 
AMD Opteron, Barcelona.  We have been using the SunStudio compilers on 
this cluster on a routine basis and have not had issues.  





Re: [OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno

Raymond Muno wrote:
We are implementing a new cluster that is InfiniBand based.  I am 
working on getting OpenMPI built for our various compile environments. 
So far it is working for PGI 7.2 and PathScale 3.1.  I found some 
workarounds for issues with the Pathscale compilers (seg faults) in 
the OpenMPI FAQ.


When I try to build with SunStudio, I cannot even get past the 
configure stage. It dies in th estage that checks for C++.


...


It looks like the problem is with SunStudio itself. Even a simple CC 
program fails to compile.


/usr/lib64/libm.so: file not recognized: File format not recognized




Re: [OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno

Raymond Muno wrote:

Raymond Muno wrote:
We are implementing a new cluster that is InfiniBand based.  I am 
working on getting OpenMPI built for our various compile 
environments. So far it is working for PGI 7.2 and PathScale 3.1.  I 
found some workarounds for issues with the Pathscale compilers (seg 
faults) in the OpenMPI FAQ.


When I try to build with SunStudio, I cannot even get past the 
configure stage. It dies in th estage that checks for C++.


It looks like the problem is with SunStudio itself. Even a simple CC 
program fails to compile.


/usr/lib64/libm.so: file not recognized: File format not recognized
OK, I took care of the linker issue fro C++ as recommended on Suns 
support site (replace Sun supplied ld with /usr/bin/ld)


Now I get farther along but the build fails at (small excerpt)

mutex.c:(.text+0x30): multiple definition of `opal_atomic_cmpset_32'
asm/.libs/libasm.a(asm.o):asm.c:(.text+0x30): first defined here
threads/.libs/mutex.o: In function `opal_atomic_cmpset_64':
mutex.c:(.text+0x50): multiple definition of `opal_atomic_cmpset_64'
asm/.libs/libasm.a(asm.o):asm.c:(.text+0x50): first defined here
make[2]: *** [libopen-pal.la] Error 1
make[2]: Leaving directory `/home/muno/OpenMPI/SunStudio/openmpi-1.2.7/opal'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/muno/OpenMPI/SunStudio/openmpi-1.2.7/opal'
make: *** [all-recursive] Error 1

I based the configure on what was found in the FAQ here.

http://www.open-mpi.org/faq/?category=building#build-sun-compilers

Perhaps this is much more specific to our platform/OS.

The environment is AMD Opteron, Barcelona running Centos 5 (Rocks 5.03) 
with SunStudio 12 compilers.


Does anyone have any insight as to how to successfully build OpenMPI for 
this OS/compiler selection?  As I said in the first post, we have it 
built for Pathscale 3.1 and PGI 7.2. 


-Ray Muno
University of Minnesota, Aerospace Engineering






[OMPI users] Building OpenMPI with Lustre support using PGI fails

2018-11-13 Thread Raymond Muno
I am trying  to build OpenMPI with Lustre support using PGI 18.7 on 
CentOS 7.5 (1804).


It builds successfully with Intel compilers, but fails to find the 
necessary  Lustre components with the PGI compiler.


I have tried building  OpenMPI 4.0.0, 3.1.3 and 2.1.5.   I can build 
OpenMPI, but configure does not find the proper Lustre files.


Lustre is installed from current client RPMS, version 2.10.5

Include files are in /usr/include/lustre

When specifying --with-lustre, I get:

--- MCA component fs:lustre (m4 configuration macro)
checking for MCA component fs:lustre compile mode... dso
checking --with-lustre value... simple ok (unspecified value)
looking for header without includes
checking lustre/lustreapi.h usability... yes
checking lustre/lustreapi.h presence... yes
checking for lustre/lustreapi.h... yes
checking for library containing llapi_file_create... -llustreapi
checking if liblustreapi requires libnl v1 or v3...
checking for required lustre data structures... no
configure: error: Lustre support requested but not found. Aborting


--
 
 Ray Muno

 IT Manager
 


  University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 110 Union St. S.E.  111 Church Street SE
 Minneapolis, MN 55455   Minneapolis, MN 55455

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Building OpenMPI with Lustre support using PGI fails

2018-11-27 Thread Raymond Muno

I apologize. I did not realize that I did not reply to the list.

Going with the view this is a PGI problem,  I noticed they recently 
released version 18.10. I had just installed 18.7 within the last couple 
weeks.


The problem is resolved in 18.10.

-Ray Muno

On 11/27/18 7:55 PM, Gilles Gouaillardet wrote:

Folks,


sorry for the late follow-up. The config.log was indeed sent offline.


Here is the relevant part :


configure:294375: checking for required lustre data structures
configure:294394: pgcc -O -DNDEBUG   -Iyes/include -c conftest.c
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
157)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 157)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
158)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 158)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
159)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 159)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
160)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 160)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
161)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 161)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
162)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 162)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
163)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 163)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
164)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 164)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
211)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 211)
PGC-S-0040-Illegal use of symbol, u_int64_t (/usr/include/sys/quota.h: 
212)
PGC-W-0156-Type not specified, 'int' assumed 
(/usr/include/sys/quota.h: 212)

PGC/x86-64 Linux 18.7-0: compilation completed with severe errors
configure:294401: $? = 2
configure:294415: result: no
configure:294424: error: Lustre support requested but not found. Aborting


Here is the conftest.c that triggers the error

#include "lustre/lustreapi.h"
void alloc_lum()
{
  int v1, v3;
  v1 = sizeof(struct lov_user_md_v1) +
   LOV_MAX_STRIPE_COUNT * sizeof(struct lov_user_ost_data_v1);
  v3 = sizeof(struct lov_user_md_v3) +
    LOV_MAX_STRIPE_COUNT * sizeof(struct lov_user_ost_data_v1);
}


The same code was reported to work with gcc compiler, so at that 
stage, this looks like


a PGI or an environment issue (sometimes the sysadmin has to re-run 
makelocalrc if some dependencies have changed),


so I recommend this error is submitted to PGI support.


I reviewed the code and filled a PR that get rids of the 
"-Iyes/include" flag.


Merged or not, that does not fix the real issue here.


Cheers,


Gilles


On 11/28/2018 6:04 AM, Gabriel, Edgar wrote:
Gilles submitted a patch for that, and I approved it a couple of days 
back, I *think* it has not been merged however. This was a bug in the 
Open MPI Lustre configure logic, should be fixed after this one however.


https://github.com/open-mpi/ompi/pull/6080

Thanks
Edgar


-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
Latham,

Robert J. via users
Sent: Tuesday, November 27, 2018 2:03 PM
To: users@lists.open-mpi.org
Cc: Latham, Robert J. ; gi...@rist.or.jp
Subject: Re: [OMPI users] Building OpenMPI with Lustre support using 
PGI fails


On Tue, 2018-11-13 at 21:57 -0600, gil...@rist.or.jp wrote:

Raymond,

can you please compress and post your config.log ?
I didn't see the config.log in response to this.  Maybe Ray and 
Giles took the
discusison off list?  As someone who might have introduced the 
offending
configure-time checks, I'm particularly interested in fixing lustre 
detection.


==rob



Cheers,

Gilles

- Original Message -

I am trying  to build OpenMPI with Lustre support using PGI 18.7 on
CentOS 7.5 (1804).

It builds successfully with Intel compilers, but fails to find the
necessary  Lustre components with the PGI compiler.

I have tried building  OpenMPI 4.0.0, 3.1.3 and 2.1.5.   I can
build
OpenMPI, but configure does not find the proper Lustre files.

Lustre is installed from current client RPMS, version 2.10.5

Include files are in /usr/include/lustre

When specifying --with-lustre, I get:

--- MCA component fs:lustre (m4 configuration macro) checking for
MCA component fs:lustre compile mode... dso checking --with-lustre
value... simple ok (unspecified value) looking for header without
includes checking lustre/lustreapi.h usability... yes checking
lustre/lustreapi.h presence... yes checking for
lustre/lustreapi.h... yes checking for library containing
llapi_file_create... -llustreapi checking if liblust

[OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users

We are primarily using OpenMPI 3.1.4 but also have 4.0.1 installed.

On our cluster, we were running CentOS 7.5 with updates, alongside 
MLNX_OFED 4.5.x.   OpenMPI was compiled with GCC, Intel, PGI and AOCC 
compilers. We could run with no issues.


To accommodate updates needed to get our IB gear all running at HDR100 
(EDR50 previously) we upgraded to CentOS 7.6.1810 and the current 
MLNX_OFED 4.6.x.


We can no longer reliably run on more than two nodes.

We see errors like:

[epyc-compute-3-2.local:42447] pml_ucx.c:380  Error: 
ucp_ep_create(proc=276) failed: Destination is unreachable
[epyc-compute-3-2.local:42447] pml_ucx.c:447  Error: Failed to resolve 
UCX endpoint for rank 276

[epyc-compute-3-2:42447] *** An error occurred in MPI_Allreduce
[epyc-compute-3-2:42447] *** reported by process 
[47894553493505,47893180318004]

[epyc-compute-3-2:42447] *** on communicator MPI_COMM_WORLD
[epyc-compute-3-2:42447] *** MPI_ERR_OTHER: known error not in list
[epyc-compute-3-2:42447] *** MPI_ERRORS_ARE_FATAL (processes in this 
communicator will now abort,

[epyc-compute-3-2:42447] ***    and potentially your MPI job)
[epyc-compute-3-17.local:36637] PMIX ERROR: UNREACHABLE in file 
server/pmix_server.c at line 2079
[epyc-compute-3-17.local:37008] pml_ucx.c:380  Error: 
ucp_ep_create(proc=147) failed: Destination is unreachable
[epyc-compute-3-17.local:37008] pml_ucx.c:447  Error: Failed to resolve 
UCX endpoint for rank 147
[epyc-compute-3-7.local:39776] 1 more process has sent help message 
help-mpi-errors.txt / mpi_errors_are_fatal
[epyc-compute-3-7.local:39776] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages


UCX appears to be part of the MLNX_OFED release, and is version 1.6.0.

OpenMPI is is built on the same OS and MLNX_OFED, as we are running on 
the compute nodes.


I have a case open with Mellanox but it is not clear where this error is 
coming from.


--
 
 Ray Muno

 IT Manager
 e-mail:m...@aem.umn.edu
 Phone:   (612) 625-9531

  University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 110 Union St. S.E.  111 Church Street SE
 Minneapolis, MN 55455   Minneapolis, MN 55455



Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
We are running against 4.0.2RC2 now. This is ussing current Intel 
compilers, version 2019update4. Still having issues.


[epyc-compute-1-3.local:17402] common_ucx.c:149  Warning: UCX is unable 
to handle VM_UNMAP event. This may cause performance degradation or data 
corruption.
[epyc-compute-1-3.local:17669] common_ucx.c:149  Warning: UCX is unable 
to handle VM_UNMAP event. This may cause performance degradation or data 
corruption.
[epyc-compute-1-3.local:17683] common_ucx.c:149  Warning: UCX is unable 
to handle VM_UNMAP event. This may cause performance degradation or data 
corruption.
[epyc-compute-1-3.local:16626] pml_ucx.c:385  Error: 
ucp_ep_create(proc=265) failed: Destination is unreachable
[epyc-compute-1-3.local:16626] pml_ucx.c:452  Error: Failed to resolve 
UCX endpoint for rank 265

[epyc-compute-1-3:16626] *** An error occurred in MPI_Allreduce
[epyc-compute-1-3:16626] *** reported by process 
[47001162088449,46999827120425]

[epyc-compute-1-3:16626] *** on communicator MPI_COMM_WORLD
[epyc-compute-1-3:16626] *** MPI_ERR_OTHER: known error not in list
[epyc-compute-1-3:16626] *** MPI_ERRORS_ARE_FATAL (processes in this 
communicator will now abort,

[epyc-compute-1-3:16626] *** and potentially your MPI job)


On 9/25/19 1:28 PM, Jeff Squyres (jsquyres) via users wrote:
Can you try the latest 4.0.2rc tarball?  We're very, very close to 
releasing v4.0.2...


I don't know if there's a specific UCX fix in there, but there are a 
ton of other good bug fixes in there since v4.0.1.



On Sep 25, 2019, at 2:12 PM, Raymond Muno via users 
mailto:users@lists.open-mpi.org>> wrote:


We are primarily using OpenMPI 3.1.4 but also have 4.0.1 installed.

On our cluster, we were running CentOS 7.5 with updates, alongside 
MLNX_OFED 4.5.x.   OpenMPI was compiled with GCC, Intel, PGI and AOCC 
compilers. We could run with no issues.


To accommodate updates needed to get our IB gear all running at 
HDR100 (EDR50 previously) we upgraded to CentOS 7.6.1810 and the 
current MLNX_OFED 4.6.x.


We can no longer reliably run on more than two nodes.

We see errors like:

[epyc-compute-3-2.local:42447] pml_ucx.c:380  Error: 
ucp_ep_create(proc=276) failed: Destination is unreachable
[epyc-compute-3-2.local:42447] pml_ucx.c:447  Error: Failed to 
resolve UCX endpoint for rank 276

[epyc-compute-3-2:42447] *** An error occurred in MPI_Allreduce
[epyc-compute-3-2:42447] *** reported by process 
[47894553493505,47893180318004]

[epyc-compute-3-2:42447] *** on communicator MPI_COMM_WORLD
[epyc-compute-3-2:42447] *** MPI_ERR_OTHER: known error not in list
[epyc-compute-3-2:42447] *** MPI_ERRORS_ARE_FATAL (processes in this 
communicator will now abort,

[epyc-compute-3-2:42447] ***    and potentially your MPI job)
[epyc-compute-3-17.local:36637] PMIX ERROR: UNREACHABLE in file 
server/pmix_server.c at line 2079
[epyc-compute-3-17.local:37008] pml_ucx.c:380  Error: 
ucp_ep_create(proc=147) failed: Destination is unreachable
[epyc-compute-3-17.local:37008] pml_ucx.c:447  Error: Failed to 
resolve UCX endpoint for rank 147
[epyc-compute-3-7.local:39776] 1 more process has sent help message 
help-mpi-errors.txt / mpi_errors_are_fatal
[epyc-compute-3-7.local:39776] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages


UCX appears to be part of the MLNX_OFED release, and is version 1.6.0.

OpenMPI is is built on the same OS and MLNX_OFED, as we are running 
on the compute nodes.


I have a case open with Mellanox but it is not clear where this error 
is coming from.

--




--
Jeff Squyres
jsquy...@cisco.com <mailto:jsquy...@cisco.com>


--
 
 Ray Muno

 IT Manager
 University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 



Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
As a test, I rebooted a set of nodes. The user could run on 480 cores, 
on 5 nodes. We could not run beyond two nodes previous to that.


We still get the VM_UNMAP warning, however.

On 9/25/19 2:09 PM, Raymond Muno via users wrote:


We are running against 4.0.2RC2 now. This is ussing current Intel 
compilers, version 2019update4. Still having issues.


[epyc-compute-1-3.local:17402] common_ucx.c:149  Warning: UCX is 
unable to handle VM_UNMAP event. This may cause performance 
degradation or data corruption.
[epyc-compute-1-3.local:17669] common_ucx.c:149  Warning: UCX is 
unable to handle VM_UNMAP event. This may cause performance 
degradation or data corruption.
[epyc-compute-1-3.local:17683] common_ucx.c:149  Warning: UCX is 
unable to handle VM_UNMAP event. This may cause performance 
degradation or data corruption.
[epyc-compute-1-3.local:16626] pml_ucx.c:385  Error: 
ucp_ep_create(proc=265) failed: Destination is unreachable
[epyc-compute-1-3.local:16626] pml_ucx.c:452  Error: Failed to resolve 
UCX endpoint for rank 265

[epyc-compute-1-3:16626] *** An error occurred in MPI_Allreduce
[epyc-compute-1-3:16626] *** reported by process 
[47001162088449,46999827120425]

[epyc-compute-1-3:16626] *** on communicator MPI_COMM_WORLD
[epyc-compute-1-3:16626] *** MPI_ERR_OTHER: known error not in list
[epyc-compute-1-3:16626] *** MPI_ERRORS_ARE_FATAL (processes in this 
communicator will now abort,

[epyc-compute-1-3:16626] ***    and potentially your MPI job)




--
 
 Ray Muno

 IT Manager
 University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 



Re: [OMPI users] UCX errors after upgrade

2019-10-02 Thread Raymond Muno via users
We are now using OpenMPI 4.0.2RC2 and RC3 compiled (with Intel, PGI and 
GCC)  with MLNX_OFED 4.7 (released a couple days ago). This supplies UCX 
1.7.  So far, it seems like things are working well.


Any estimate on when OpenMPI 4.2 will be released?


On 9/25/19 2:27 PM, Jeff Squyres (jsquyres) wrote:
Thanks Raymond; I have filed an issue for this on Github and tagged 
the relevant Mellanox people:


https://github.com/open-mpi/ompi/issues/7009


On Sep 25, 2019, at 3:09 PM, Raymond Muno via users 
mailto:users@lists.open-mpi.org>> wrote:


We are running against 4.0.2RC2 now. This is ussing current Intel 
compilers, version 2019update4. Still having issues.


[epyc-compute-1-3.local:17402] common_ucx.c:149  Warning: UCX is 
unable to handle VM_UNMAP event. This may cause performance 
degradation or data corruption.
[epyc-compute-1-3.local:17669] common_ucx.c:149  Warning: UCX is 
unable to handle VM_UNMAP event. This may cause performance 
degradation or data corruption.
[epyc-compute-1-3.local:17683] common_ucx.c:149  Warning: UCX is 
unable to handle VM_UNMAP event. This may cause performance 
degradation or data corruption.
[epyc-compute-1-3.local:16626] pml_ucx.c:385  Error: 
ucp_ep_create(proc=265) failed: Destination is unreachable
[epyc-compute-1-3.local:16626] pml_ucx.c:452  Error: Failed to 
resolve UCX endpoint for rank 265

[epyc-compute-1-3:16626] *** An error occurred in MPI_Allreduce
[epyc-compute-1-3:16626] *** reported by process 
[47001162088449,46999827120425]

[epyc-compute-1-3:16626] *** on communicator MPI_COMM_WORLD
[epyc-compute-1-3:16626] *** MPI_ERR_OTHER: known error not in list
[epyc-compute-1-3:16626] *** MPI_ERRORS_ARE_FATAL (processes in this 
communicator will now abort,

[epyc-compute-1-3:16626] ***    and potentially your MPI job)


On 9/25/19 1:28 PM, Jeff Squyres (jsquyres) via users wrote:
Can you try the latest 4.0.2rc tarball?  We're very, very close to 
releasing v4.0.2...


I don't know if there's a specific UCX fix in there, but there are a 
ton of other good bug fixes in there since v4.0.1.



On Sep 25, 2019, at 2:12 PM, Raymond Muno via users 
mailto:users@lists.open-mpi.org>> wrote:


We are primarily using OpenMPI 3.1.4 but also have 4.0.1 installed.

On our cluster, we were running CentOS 7.5 with updates, alongside 
MLNX_OFED 4.5.x.   OpenMPI was compiled with GCC, Intel, PGI and 
AOCC compilers. We could run with no issues.


To accommodate updates needed to get our IB gear all running at 
HDR100 (EDR50 previously) we upgraded to CentOS 7.6.1810 and the 
current MLNX_OFED 4.6.x.


We can no longer reliably run on more than two nodes.

We see errors like:

[epyc-compute-3-2.local:42447] pml_ucx.c:380  Error: 
ucp_ep_create(proc=276) failed: Destination is unreachable
[epyc-compute-3-2.local:42447] pml_ucx.c:447  Error: Failed to 
resolve UCX endpoint for rank 276

[epyc-compute-3-2:42447] *** An error occurred in MPI_Allreduce
[epyc-compute-3-2:42447] *** reported by process 
[47894553493505,47893180318004]

[epyc-compute-3-2:42447] *** on communicator MPI_COMM_WORLD
[epyc-compute-3-2:42447] *** MPI_ERR_OTHER: known error not in list
[epyc-compute-3-2:42447] *** MPI_ERRORS_ARE_FATAL (processes in 
this communicator will now abort,

[epyc-compute-3-2:42447] ***    and potentially your MPI job)
[epyc-compute-3-17.local:36637] PMIX ERROR: UNREACHABLE in file 
server/pmix_server.c at line 2079
[epyc-compute-3-17.local:37008] pml_ucx.c:380  Error: 
ucp_ep_create(proc=147) failed: Destination is unreachable
[epyc-compute-3-17.local:37008] pml_ucx.c:447  Error: Failed to 
resolve UCX endpoint for rank 147
[epyc-compute-3-7.local:39776] 1 more process has sent help message 
help-mpi-errors.txt / mpi_errors_are_fatal
[epyc-compute-3-7.local:39776] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages


UCX appears to be part of the MLNX_OFED release, and is version 1.6.0.

OpenMPI is is built on the same OS and MLNX_OFED, as we are running 
on the compute nodes.


I have a case open with Mellanox but it is not clear where this 
error is coming from.

--




--
Jeff Squyres
jsquy...@cisco.com <mailto:jsquy...@cisco.com>


--
  
  Ray Muno

  IT Manager
  University of Minnesota
  Aerospace Engineering and Mechanics Mechanical Engineering
  



--
Jeff Squyres
jsquy...@cisco.com <mailto:jsquy...@cisco.com>


--
 
 Ray Muno

 IT Manager
 University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 



[OMPI users] Parameters at run time

2019-10-19 Thread Raymond Muno via users
Is there a way to determine, at run time, as to what choices OpenMPI 
made in terms of transports that are being utilized?  We want to verify 
we are running UCX over Infiniband.


I have two users, executing the identical code, with the same mpirun 
options, getting vastly different execution times on the same cluster.


--
 
 Ray Muno

 IT Manager
 University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 



Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 6 
was able to support these. We moved on to CentOS 7.6 at first and are 
now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier 
releases did not support x2APIC and could not handle 256 threads. Not 
and issue on EPYC/Naples, but it was an issue on dual 64 core EPYC2.


Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for 
EPYC2(Rome).


-Ray Muno

On 1/8/20 2:51 PM, Prentice Bisbal via users wrote:


On 1/8/20 3:30 PM, Brice Goglin via users wrote:

Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :

We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages:

-- 



WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

   Node:  dawson205


I wonder if the CentOS 6 kernel properly supports these recent
processors. Does lstopo show NUMA nodes as expected?

Brice

lstopo shows different numa nodes, and it appears to be correct, but I 
don't use lstopo that much, so I'm not 100%  confident that what it's 
showing is correct. I'm at about 98%.


Prentice


--
 
 Ray Muno

 IT Manager
 University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering



Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 6 
was able to support these. We moved on to CentOS 7.6 at first and are 
now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier 
releases did not support x2APIC and could not handle 256 threads. Not 
and issue on EPYC/Naples, but it was an issue on dual 64 core EPYC2.


Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for 
EPYC2(Rome).


-Ray Muno

On 1/8/20 2:51 PM, Prentice Bisbal via users wrote:


On 1/8/20 3:30 PM, Brice Goglin via users wrote:

Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :

We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages:

-- 



WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

   Node:  dawson205


I wonder if the CentOS 6 kernel properly supports these recent
processors. Does lstopo show NUMA nodes as expected?

Brice

lstopo shows different numa nodes, and it appears to be correct, but I 
don't use lstopo that much, so I'm not 100%  confident that what it's 
showing is correct. I'm at about 98%.


Prentice


--
 
 Ray Muno

 IT Manager
 e-mail:   m...@aem.umn.edu
 Phone:   (612) 625-9531

  University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 110 Union St. S.E.  111 Church Street SE
 Minneapolis, MN 55455   Minneapolis, MN 55455



Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
AMD, list the minimum supported kernel for EPYC/NAPLES as RHEL/Centos 
kernel 3.10-862, which is RHEL/CentOS 7.5 or later. Upgraded kernels can 
be used in 7.4.


http://developer.amd.com/wp-content/resources/56420.pdf

-Ray Muno

On 1/8/20 7:37 PM, Raymond Muno wrote:
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 
6 was able to support these. We moved on to CentOS 7.6 at first and 
are now running 7.7 to support the EPYC2/Rome nodes. The kernel in 
earlier releases did not support x2APIC and could not handle 256 
threads. Not and issue on EPYC/Naples, but it was an issue on dual 64 
core EPYC2.


Redhat lists 7.4 as the minimum for EPYC(Naples) support and 7.6.6 for 
EPYC2(Rome).


-Ray Muno

On 1/8/20 2:51 PM, Prentice Bisbal via users wrote:


On 1/8/20 3:30 PM, Brice Goglin via users wrote:

Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :

We just added about a dozen nodes to our cluster, which have AMD EPYC
7281 processors. When a particular users jobs fall on one of these
nodes, he gets these error messages:

-- 



WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

   Node:  dawson205


I wonder if the CentOS 6 kernel properly supports these recent
processors. Does lstopo show NUMA nodes as expected?

Brice

lstopo shows different numa nodes, and it appears to be correct, but 
I don't use lstopo that much, so I'm not 100%  confident that what 
it's showing is correct. I'm at about 98%.


Prentice


--
 
 Ray Muno

 IT Manager
 University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering



[OMPI users] OpenMPI 4.0.2 with PGI 19.10, will not build with hcoll

2020-01-24 Thread Raymond Muno via users
I am having issues building OpenMPI 4.0.2 using the PGI 19.10 compilers.  OS is CentOS 7.7, 
MLNX_OFED 4.7.3


It dies at:

PGC/x86-64 Linux 19.10-0: compilation completed with warnings
  CCLD mca_coll_hcoll.la
pgcc-Error-Unknown switch: -pthread
make[2]: *** [mca_coll_hcoll.la] Error 1
make[2]: Leaving directory 
`/project/muno/OpenMPI/PGI/openmpi-4.0.2/ompi/mca/coll/hcoll'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/project/muno/OpenMPI/PGI/openmpi-4.0.2/ompi'
make: *** [all-recursive] Error 1

I tried with PGI 19.9 and had the same issue.

If I do not include hcoll, it builds.  I have successfully built OpenMPI 4.0.2 with GCC, Intel and 
AOCC compilers, all using the same options.


hcoll is provided by MLNX_OFED 4.7.3 and configure is run with

--with-hcoll=/opt/mellanox/hcoll


--

 
 Ray Muno

 IT Manager
 e-mail:m...@aem.umn.edu
  University of Minnesota
 Aerospace Engineering and Mechanics Mechanical Engineering
 



Re: [OMPI users] OpenMPI 4.1.1, CentOS 7.9, nVidia HPC-SDk, build hints?

2021-09-30 Thread Raymond Muno via users
 Added -*-enable-mca-no-build=op-avx *to the configure line. Still dies 
in the same place.



CCLD mca_op_avx.la
./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x0): 
multiple definition of `ompi_op_avx_functions_avx2'
./.libs/liblocal_ops_avx2.a(liblocal_ops_avx2_la-op_avx_functions.o):(.data+0x0): 
first defined here
./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o): 
In function `ompi_op_avx_2buff_min_uint16_t_avx2':
/project/muno/OpenMPI/BUILD/SRC/openmpi-4.1.1/ompi/mca/op/avx/op_avx_functions.c:651: 
multiple definition of `ompi_op_avx_3buff_functions_avx2'
./.libs/liblocal_ops_avx2.a(liblocal_ops_avx2_la-op_avx_functions.o):/project/muno/OpenMPI/BUILD/SRC/openmpi-4.1.1/ompi/mca/op/avx/op_avx_functions.c:651: 
first defined here

make[2]: *** [mca_op_avx.la] Error 2
make[2]: Leaving directory 
`/project/muno/OpenMPI/BUILD/4.1.1/ROME/NV-HPC/21.9/ompi/mca/op/avx'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/project/muno/OpenMPI/BUILD/4.1.1/ROME/NV-HPC/21.9/ompi'

make: *** [all-recursive] Error 1

On 9/30/21 5:54 AM, Carl Ponder wrote:


For now, you can suppress this error building OpenMPI 4.1.1


./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x0):
multipledefinition of `ompi_op_avx_functions_avx2'

./.libs/liblocal_ops_avx2.a(liblocal_ops_avx2_la-op_avx_functions.o):(.data+0x0):
first defined here
./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o):
In function`ompi_op_avx_2buff_min_uint16_t_avx2':

/project/muno/OpenMPI/BUILD/SRC/openmpi-4.1.1/ompi/mca/op/avx/op_avx_functions.c:651:
multipledefinition of `ompi_op_avx_3buff_functions_avx2'

./.libs/liblocal_ops_avx2.a(liblocal_ops_avx2_la-op_avx_functions.o):/project/muno/OpenMPI/BUILD/SRC/openmpi-4.1.1/ompi/mca/op/avx/op_avx_functions.c:651:
first defined here

with the NVHPC/PGI 21.9 compiler by using the setting

configure -*-enable-mca-no-build=op-avx* ...

We're still looking at the cause here. I don't have any advice about 
the problem with 21.7.



Subject: 	Re: [OMPI users] OpenMPI 4.1.1, CentOS 7.9, nVidia HPC-SDk, 
build hints?

Date:   Wed, 29 Sep 2021 12:25:43 -0500
From:   Ray Muno via users 
Reply-To:   Open MPI Users 
To: users@lists.open-mpi.org
CC: Ray Muno 



External email: Use caution opening links or attachments


Tried this

configure CC='nvc -fPIC' CXX='nvc++ -fPIC' FC='nvfortran -fPIC'

Configure completes. Compiles quite a way through. Dies in a different 
place. It does get past the

first error, however with libmpi_usempif08.la


FCLD libmpi_usempif08.la
make[2]: Leaving directory
`/project/muno/OpenMPI/BUILD/4.1.1/ROME/NV-HPC/21.9/ompi/mpi/fortran/use-mpi-f08'
Making all in mpi/fortran/mpiext-use-mpi-f08
make[2]: Entering directory
`/project/muno/OpenMPI/BUILD/4.1.1/ROME/NV-HPC/21.9/ompi/mpi/fortran/mpiext-use-mpi-f08'
PPFC mpi-f08-ext-module.lo
FCLD libforce_usempif08_module_to_be_built.la
make[2]: Leaving directory
`/project/muno/OpenMPI/BUILD/4.1.1/ROME/NV-HPC/21.9/ompi/mpi/fortran/mpiext-use-mpi-f08'

Dies here now.

CCLD liblocal_ops_avx512.la
CCLD mca_op_avx.la
./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x0): 
multiple

definition of `ompi_op_avx_functions_avx2'
./.libs/liblocal_ops_avx2.a(liblocal_ops_avx2_la-op_avx_functions.o):(.data+0x0): 
first defined here
./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o): 
In function

`ompi_op_avx_2buff_min_uint16_t_avx2':
/project/muno/OpenMPI/BUILD/SRC/openmpi-4.1.1/ompi/mca/op/avx/op_avx_functions.c:651: 
multiple

definition of `ompi_op_avx_3buff_functions_avx2'
./.libs/liblocal_ops_avx2.a(liblocal_ops_avx2_la-op_avx_functions.o):/project/muno/OpenMPI/BUILD/SRC/openmpi-4.1.1/ompi/mca/op/avx/op_avx_functions.c:651:
first defined here
make[2]: *** [mca_op_avx.la] Error 2
make[2]: Leaving directory 
`/project/muno/OpenMPI/BUILD/4.1.1/ROME/NV-HPC/21.9/ompi/mca/op/avx'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/project/muno/OpenMPI/BUILD/4.1.1/ROME/NV-HPC/21.9/ompi'

make: *** [all-recursive] Error 1


On 9/29/21 11:42 AM, Bennet Fauber via users wrote:

Ray,

If all the errors about not being compiled with -fPIC are still appearing, 
there may be a bug that
is preventing the option from getting through to the compiler(s).  It might be 
worth looking through
the logs to see the full compile command for one or more of them to see whether 
that is true?  Say,
libs/comm_spawn_multiple_f08.o for example?

If -fPIC is missing, you may be able to recompile that manually with the -fPIC 
in place, then remake
and see if that also causes the link error to go away, that would be a good 
start.

Hope this helps,-- bennet



On Wed, Sep 29, 2021 at 12:29 PM Ray Muno via users mailto:users@lists.open-mpi.org>> wrote:

 I did try that an