Davide,
That is the consequence of an Open MPI bug in the PMI detection.
For the time being, you can use the attached patch (not a final one though)
Note you need to have recent autotools installed, and then you will have to
autogen.sh --force
before rebuilding Open MPI
Cheers,
Gilles
On 2/23/2018 2:42 AM, Vanzo, Davide wrote:
Jeff,
I have to resuscitate this thread since the issue is still there even
on version 2.1.1.
I have also tried with --with-pmi without a path, but the -L/usr/lib64
is still added to the wrapper. If I do not add it at all, then
launching an MPI application with srun will not work.
What would you suggest to try?
--
*Davide Vanzo, PhD*
Application Developer
Adjunct Assistant Professor of Chemical and Biomolecular Engineering
Advanced Computing Center for Research and Education (ACCRE)
Vanderbilt University - Hill Center 201
(615)-875-9137
www.accre.vanderbilt.edu
On 2017-11-29 16:25:16-06:00 Vanzo, Davide wrote:
Jeff,
Thanks for pointing me in the right direction. I have finally
figured out what the problem is.
On the cluster we install Slurm via RPMs and the PMI/PMI2
librariesĀ are in /usr/lib64. Hence the -L/usr/lib64 flag is the
effect of theĀ --with-pmi=/usr configure flag. The good thing is
that even by omitting it the final binary is correctly linked to
the PMI libraries.
And the reason why in the other system I tested the build it was
working is because there is no Slurm installed in it.
--
*Davide Vanzo, PhD*
Application Developer
Adjunct Assistant Professor of Chemical and Biomolecular Engineering
Advanced Computing Center for Research and Education (ACCRE)
Vanderbilt University - Hill Center 201
(615)-875-9137
www.accre.vanderbilt.edu
On 2017-11-29 16:07:04-06:00 Jeff Squyres (jsquyres) wrote:
On Nov 29, 2017, at 4:51 PM, Vanzo, Davide
<davide.va...@vanderbilt.edu> wrote:
>
> Although tempting, changing the version of OpenMPI
would mean a significant amount of changes in our software stack.
Understood.
FWIW: the only differences between 1.10.3 and 1.10.7 were bug fixes
(including, I'm assuming -- I haven't tested myself -- this -L issue).
Hypothetically, it should be a fairly painless upgrade.
> Hence I would like to find out what the problem is and
hopefully its solution.
>
> Where is the -L/usr/lib64 injected? Is there a way to
patch the code so that it does not get added to the list of options to gfortran?
It's injected pretty deep inside configure.
We might be able to spelunk through the git logs to find the commit
that fixes this issue and you could apply that as a patch, but it might be
easier to just manually patch up the wrapper compiler data file after the build.
Specifically, it looks like OMPI 1.10.3 is installing faulty values
$prefix/share/openmpi/*-wrapper-data.txt. You can easily edit these files
directly and remove the erroneous -L/usr/lib64. If you're unable to upgrade to
1.10.7, patching the installed *-wrapper-data.txt files is probably your best
bet.
--
Jeff Squyres
jsquy...@cisco.com
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.open-mpi.org%2Fmailman%2Flistinfo%2Fusers&data=02%7C01%7Cdavide.vanzo%40vanderbilt.edu%7C11f5b064a08144e0e4bc08d5377584eb%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636475900235203296&sdata=O94%2FKc7jajpw5%2BdCuRxvkjrdoR9ESR0DLB61C30%2BBp0%3D&reserved=0
</davide.va...@vanderbilt.edu>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
diff --git a/config/opal_check_pmi.m4 b/config/opal_check_pmi.m4
index dee6d94..fb05ca6 100644
--- a/config/opal_check_pmi.m4
+++ b/config/opal_check_pmi.m4
@@ -174,10 +174,10 @@ AC_DEFUN([OPAL_CHECK_PMI],[
[opal_enable_pmi1=no])
AS_IF([test "$opal_enable_pmi1" = "yes"],
- [AS_IF([test "$default_pmi_loc" = "no" || test
"$slurm_pmi_found" = "yes"],
+ [AS_IF([test "$slurm_pmi_found" = "yes"],
[opal_pmi1_CPPFLAGS="$pmi_CPPFLAGS"
AC_SUBST(opal_pmi1_CPPFLAGS)])
- AS_IF([test "$default_pmi_libloc" = "no" || test
"$slurm_pmi_found" = "yes"],
+ AS_IF([test "$default_pmi_libloc" = "no" && test
"$slurm_pmi_found" = "yes"],
[opal_pmi1_LDFLAGS="$pmi_LDFLAGS"
AC_SUBST(opal_pmi1_LDFLAGS)
opal_pmi1_rpath="$pmi_rpath"
@@ -195,10 +195,10 @@ AC_DEFUN([OPAL_CHECK_PMI],[
[opal_enable_pmi2=no])
AS_IF([test "$opal_enable_pmi2" = "yes"],
- [AS_IF([test "$default_pmi_loc" = "no" || test
"$slurm_pmi_found" = "yes"],
+ [AS_IF([test "$slurm_pmi_found" = "yes"],
[opal_pmi2_CPPFLAGS="$pmi2_CPPFLAGS"
AC_SUBST(opal_pmi2_CPPFLAGS)])
- AS_IF([test "$default_pmi_libloc" = "no" || test
"$slurm_pmi_found" = "yes"],
+ AS_IF([test "$default_pmi_libloc" = "no" && test
"$slurm_pmi_found" = "yes"],
[opal_pmi2_LDFLAGS="$pmi2_LDFLAGS"
AC_SUBST(opal_pmi2_LDFLAGS)
opal_pmi2_rpath="$pmi2_rpath"
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users