Davide,

That is the consequence of an Open MPI bug in the PMI detection.


For the time being, you can use the attached patch (not a final one though)

Note you need to have recent autotools installed, and then you will have to

autogen.sh --force

before rebuilding Open MPI



Cheers,


Gilles


On 2/23/2018 2:42 AM, Vanzo, Davide wrote:
Jeff,
I have to resuscitate this thread since the issue is still there even on version 2.1.1. I have also tried with --with-pmi without a path, but the -L/usr/lib64 is still added to the wrapper. If I do not add it at all, then launching an MPI application with srun will not work.
What would you suggest to try?
--
*Davide Vanzo, PhD*
Application Developer
Adjunct Assistant Professor of Chemical and Biomolecular Engineering
Advanced Computing Center for Research and Education (ACCRE)
Vanderbilt University - Hill Center 201
(615)-875-9137
www.accre.vanderbilt.edu

On 2017-11-29 16:25:16-06:00 Vanzo, Davide wrote:

    Jeff,

    Thanks for pointing me in the right direction. I have finally
    figured out what the problem is.

    On the cluster we install Slurm via RPMs and the PMI/PMI2
    librariesĀ are in /usr/lib64. Hence the -L/usr/lib64 flag is the
    effect of theĀ --with-pmi=/usr configure flag. The good thing is
    that even by omitting it the final binary is correctly linked to
    the PMI libraries.

    And the reason why in the other system I tested the build it was
    working is because there is no Slurm installed in it.

-- *Davide Vanzo, PhD*
    Application Developer
    Adjunct Assistant Professor of Chemical and Biomolecular Engineering
    Advanced Computing Center for Research and Education (ACCRE)
    Vanderbilt University - Hill Center 201
    (615)-875-9137
    www.accre.vanderbilt.edu

    On 2017-11-29 16:07:04-06:00 Jeff Squyres (jsquyres) wrote:

        On Nov 29, 2017, at 4:51 PM, Vanzo, Davide 
<davide.va...@vanderbilt.edu> wrote:
        >
        > Although tempting, changing the version of OpenMPI 
would mean a significant amount of changes in our software stack.

        Understood.

        FWIW: the only differences between 1.10.3 and 1.10.7 were bug fixes 
(including, I'm assuming -- I haven't tested myself -- this -L issue).  
Hypothetically, it should be a fairly painless upgrade.

        > Hence I would like to find out what the problem is and 
hopefully its solution.
        >
        > Where is the -L/usr/lib64 injected? Is there a way to 
patch the code so that it does not get added to the list of options to gfortran?

        It's injected pretty deep inside configure.

        We might be able to spelunk through the git logs to find the commit 
that fixes this issue and you could apply that as a patch, but it might be 
easier to just manually patch up the wrapper compiler data file after the build.

        Specifically, it looks like OMPI 1.10.3 is installing faulty values 
$prefix/share/openmpi/*-wrapper-data.txt.  You can easily edit these files 
directly and remove the erroneous -L/usr/lib64.  If you're unable to upgrade to 
1.10.7, patching the installed *-wrapper-data.txt files is probably your best 
bet.

-- Jeff Squyres
        jsquy...@cisco.com

        _______________________________________________
        users mailing list
        users@lists.open-mpi.org
        
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.open-mpi.org%2Fmailman%2Flistinfo%2Fusers&data=02%7C01%7Cdavide.vanzo%40vanderbilt.edu%7C11f5b064a08144e0e4bc08d5377584eb%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636475900235203296&sdata=O94%2FKc7jajpw5%2BdCuRxvkjrdoR9ESR0DLB61C30%2BBp0%3D&reserved=0
        </davide.va...@vanderbilt.edu>



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

diff --git a/config/opal_check_pmi.m4 b/config/opal_check_pmi.m4
index dee6d94..fb05ca6 100644
--- a/config/opal_check_pmi.m4
+++ b/config/opal_check_pmi.m4
@@ -174,10 +174,10 @@ AC_DEFUN([OPAL_CHECK_PMI],[
                               [opal_enable_pmi1=no])
 
            AS_IF([test "$opal_enable_pmi1" = "yes"],
-                 [AS_IF([test "$default_pmi_loc" = "no" || test 
"$slurm_pmi_found" = "yes"],
+                 [AS_IF([test "$slurm_pmi_found" = "yes"],
                        [opal_pmi1_CPPFLAGS="$pmi_CPPFLAGS"
                          AC_SUBST(opal_pmi1_CPPFLAGS)])
-                  AS_IF([test "$default_pmi_libloc" = "no" || test 
"$slurm_pmi_found" = "yes"],
+                  AS_IF([test "$default_pmi_libloc" = "no" && test 
"$slurm_pmi_found" = "yes"],
                        [opal_pmi1_LDFLAGS="$pmi_LDFLAGS"
                          AC_SUBST(opal_pmi1_LDFLAGS)
                          opal_pmi1_rpath="$pmi_rpath"
@@ -195,10 +195,10 @@ AC_DEFUN([OPAL_CHECK_PMI],[
                               [opal_enable_pmi2=no])
 
            AS_IF([test "$opal_enable_pmi2" = "yes"],
-                 [AS_IF([test "$default_pmi_loc" = "no" || test 
"$slurm_pmi_found" = "yes"],
+                 [AS_IF([test "$slurm_pmi_found" = "yes"],
                        [opal_pmi2_CPPFLAGS="$pmi2_CPPFLAGS"
                          AC_SUBST(opal_pmi2_CPPFLAGS)])
-                  AS_IF([test "$default_pmi_libloc" = "no" || test 
"$slurm_pmi_found" = "yes"],
+                  AS_IF([test "$default_pmi_libloc" = "no" && test 
"$slurm_pmi_found" = "yes"],
                        [opal_pmi2_LDFLAGS="$pmi2_LDFLAGS"
                          AC_SUBST(opal_pmi2_LDFLAGS)
                          opal_pmi2_rpath="$pmi2_rpath"
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to