Re: [OMPI users] Issues with compilers

2021-02-01 Thread Jeff Squyres (jsquyres) via users
Can you `make V=1` for that orte-info build?

That will show us the exact command that was used to build orte-info (vs. just 
the abbreviated "CCorte-info.o"


On Feb 1, 2021, at 4:50 AM, Alvaro Payero Pinto 
<10alvaro...@gmail.com> wrote:

Hi Jeff,

First of all, this message was sent a week ago but I am not really sure if it 
was received or not.
Sorry for the delay, I could not reach the office and try the solutions you 
proposed until today.
The idea was to compile Open MPI itself with Intel Fortran libraries static 
linking. I’ve only
found these dependencies in libmpi_usempi_ignore_tkr.so and libmpi_usempif08.so.
Following the recommendation you gave in situation #2, I implemented a first 
version of the
wrapper to strip “-static-intel” and “-Wc,-static-intel” appearances only for 
C/C++.
- gcc_wrap in a folder in PATH -
#!/bin/bash
WORDTOREMOVE1="-static-intel"
WORDTOREMOVE2="-Wc,$WORDTOREMOVE1"
ARGS=$(echo ${*//$WORDTOREMOVE2/})
ARGS=$(echo ${ARGS//$WORDTOREMOVE1/})
gcc $ARGS
-
g++_wrap is pretty much the same but replacing gcc by g++. The call to 
configure now
becomes:
./configure --prefix=/usr/local/ --libdir=/usr/local/lib64/ 
--includedir=/usr/local/include/
CC=gcc_wrap CXX=g++_wrap 'FLAGS=-O2 -m64' 'CFLAGS=-O2 -m64' 
'CXXFLAGS=-O2 -m64'
FC=ifort 'FCFLAGS=-O2 -m64' LDFLAGS=-Wc,-static-intel
Everything goes fine in the configure execution, but when the system tries to 
compile orte-info
it crashes. I enclose the outputs again. configure.out contains both the 
outputs from configure
and compilation.
Kind regards,
Álvaro


El vie, 22 ene 2021 a las 16:09, Jeff Squyres (jsquyres) 
(mailto:jsquy...@cisco.com>>) escribió:
On Jan 22, 2021, at 9:49 AM, Alvaro Payero Pinto via users 
mailto:users@lists.open-mpi.org>> wrote:
>
> I am trying to install Open MPI with Intel compiler suite for the Fortran 
> side and GNU compiler suite for the C side. For factors that don’t depend 
> upon me, I’m not allowed to change the C compiler suite to Intel one since 
> that would mean an additional license.

Yoinks.  I'll say right off that this will be a challenge.

> Problem arises with the fact that the installation should not dynamically 
> depend on Intel libraries, so the flag “-static-intel” (or similar) should be 
> passed to the Fortran compiler. I’ve seen in the FAQ that this problem is 
> solved by passing an Autotools option “-Wc,-static-intel” to the variable 
> LDFLAGS when invoking configure with Intel compilers. This works if both 
> C/C++ and Fortran compilers are from Intel. However, it crashes if the 
> compiler suite is mixed since GNU C/C++ does not recognise the 
> “-static-intel” option.

The problem is that the same LDFLAGS value is used for all 3 languages (C, C++, 
Fortran), because they can all be compiled into a single application.  So the 
Autotools don't separate out different LDFLAGS for the different languages.

> Is there any way to bypass this crash and to indicate that such option should 
> only be passed when using Fortran compiler?

Keep in mind that there's also two different cases here:

1. When compiling Open MPI itself
2. When compiling MPI applications

You can customize the behavior of the mpifort wrapper compiler by editing 
share/openmpi/mpifort-wrapper-data.txt.

#1 is likely to be a bit more of a challenge.

...but the thought occurs to me that #2 may be sufficient.  You might want to 
try it and see if your MPI applications have the Intel libraries statically 
linked, and that's enough...?

> Configure call to reproduce the crash is made as follows:
>
> ./configure --prefix=/usr/local/ --libdir=/usr/local/lib64/ 
> --includedir=/usr/local/include/ CC=gcc CXX=g++ 'FLAGS=-O2 -m64' 'CFLAGS=-O2 
> -m64' 'CXXFLAGS=-O2 -m64' FC=ifort 'FCFLAGS=-O2 -m64' 
> LDFLAGS=-Wc,-static-intel

The other, slightly more invasive mechanism you could try if #2 is not 
sufficient is to write your own wrapper compiler script that intercepts / 
strips out -Wc,-static-intel for the C and C++ compilers.  For example:

./configure CC=my_gcc_wrapper.sh CXX=my_g++_wrapper.sh ... 
LDFLAGS=-Wc,-static-intel

Those two scripts are simple shell scripts that strip -Wc,-static-intel if it 
sees it, but otherwise just invoke gcc/g++ with all other contents of $*.

It's a gross hack, but it might work.

--
Jeff Squyres
jsquy...@cisco.com




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] OpenMPI version compatibility with libfabric, UCX, etc...

2021-02-01 Thread Jeff Squyres (jsquyres) via users
On Jan 26, 2021, at 3:03 PM, Craig via users 
mailto:users@lists.open-mpi.org>> wrote:

Is there a table somewhere that tells me what version of things like libfabric 
and UCX
(and maybe compiler versions if there are known issues) are known to be good 
with
which versions of OpenMPI?  I poked around in the README, FAQ, Download sections
and for the life of me can't seem to find it.

I don't think we have a definitive list like this, sorry.

In general, the latest versions of dependent packages are generally a good bet.

Compiler support has been pretty stable for a while.

The UCX folks recently wondered if they should stop supporting anything before 
1.9.0 (see 
https://github.com/open-mpi/ompi/issues/8321#issuecomment-769951617).  So if 
you're using UCX, perhaps go with that.

For Libfabric, I'd generally say the same thing: use the most recent release 
and you should be ok.

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] High errorcode message

2021-02-01 Thread Arturo Fernandez via users

The app is not calling MPI_ABORT directly. I dug a little deeper into it
but didn't find anything interesting. It just doesn't find the subdirectory
for output purposes (the internal error variable is 0) and simply crashes
when returning from the subroutine. It was just me not setting things up
properly but everything seems to be working fine now.
Jeff Squyres (jsquyres) wrote:
Is your app calling MPI_Abort directly? There's a 2nd argument to MPI_ABORT
that should be passed to the output message. If it's not, we should
investigate that.
Or is your app aborting in some other, indirect method? If so, perhaps
somehow that 2nd argument is getting dropped somewhere along the way, and
the number you're seeing in the message is effectively an uninitialized
integer. That's probably not *too* alarming in this case (because you're
aborting, after all). But it would probably be good to understand that code
path and fix it up if there's something wrong.
On Jan 30, 2021, at 11:30 AM, Arturo Fernandez mailto:afernan...@odyhpc.com> > wrote:
Hi Jeff. Sorry for the delay. It took a while but I was finally error to
track down the point where the app breaks down. The problem seems to
originate in an output subroutine, not because any MPI communication is
malfunctioning. My guess is that MPI_Abort needs to produce some error
message. Why the high number? Not sure. Thanks.
Arturo
--
Jeff Squyres
jsquy...@cisco.com