Ok, thanks.  "Coordinating" with sys admins is problematic so I guess I'll just continue with the internal pmix and keep an eye out for problems.

At least I know I'm not doing anything blatantly stupid.

On 3/27/23 20:46, Pritchard Jr., Howard wrote:

HI Craig,

Its not essential to use the pmix lib used to build the SLURM pmix plugin but it does reduce likelihood of problems.

I don’t know how, but there is some way that the admin installing SLURM can “name” the available pmix –mpi options.

For instance on one of our systems, the admin has built multiple variants of the pmix plugin:

MPI plugin types are...

cray_shasta

none

pmi2

pmix

specific pmix plugin versions available: pmix_v2,pmix_v3,pmix_v314,pmix_v4,pmix_v422

This naming convention has helped us with “decoupling” building of Open MPI from SLURM build, but does mean some coordination with the sys admins.

We’re using SLURM 22.05.6

Hope this helps,

Howard

*From: *slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Craig <cfre...@super.org>
*Reply-To: *Slurm User Community List <slurm-users@lists.schedmd.com>
*Date: *Monday, March 27, 2023 at 2:01 PM
*To: *"slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com>
*Subject: *Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?

conf.log...

    checking if user requested PMI support

    result: no

    checking if user requested internal PMIx support(yes)

    result: no

    checking for pmix.h in /usr

    result: not found

    checking for pmix.h in /usr/include

    result: not found

    WARNING: discovered external PMIx version is less than internal
    version 3.x

    WARNING: using internal PMIx

So is looks like it used the internal version (which is what I was aiming for) and that's ok by me since it seems to be working, but if I'm really supposed to be using the same one that SLURM used then I'm gonna have to figure out a way to determine what that was/is.

On 3/27/23 15:28, Pritchard Jr., Howard wrote:

    HI Craig,

    Your use of the –with-pmix on the open mpi configure line is
    important.  Without  any args to this configure option open mpi
    configure will first check if there’s an external pmix which is
    newer than the one that is included in the openmpi release
    tarball.  If it is not, the internal pmix is built.

    You can check in the config.log whether the internal PMix or an
    external one was used.

    If you want to be extra careful, find the location of the PMIx v3
    used to build the SLURM PMIx plugin, and then rebuild your open
    mpi 4.1.5 with

    ./configure …
    --with-pmix=path_to_pmix_used_for_slurm_pmix_plugin_build ….

    But you may be okay without doing this. You can check this by
    running your open mpi job with

    srun –mpi=pmix_v3 -N2 foo

    and see if it behaves as expected.

    I’m not sure what the “openmpi” result from srun –mpi=list is about.

    Howard

    *From: *slurm-users <slurm-users-boun...@lists.schedmd.com>
    <mailto:slurm-users-boun...@lists.schedmd.com>on behalf of Craig
    <cfre...@super.org> <mailto:cfre...@super.org>
    *Reply-To: *Slurm User Community List
    <slurm-users@lists.schedmd.com> <mailto:slurm-users@lists.schedmd.com>
    *Date: *Monday, March 27, 2023 at 12:54 PM
    *To: *"slurm-users@lists.schedmd.com"
    <mailto:slurm-users@lists.schedmd.com><slurm-users@lists.schedmd.com>
    <mailto:slurm-users@lists.schedmd.com>
    *Subject: *Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm
    clarification?

    srun: MPI types are...

    srun: none

    srun: openmpi

    srun: pmix_v3

    srun: pmi2

    srun: pmix

    but I'm not sure that tells me much about how I am supposed to be
    building OpenMPI?

    On 3/27/23 14:41, Pritchard Jr., Howard wrote:

        HI Craig,

        If you run

        srun –mpi=list

        what does slurm report?

        That will help in determining what argument you want to supply
        for the –mpi srun option.

        Howard

        *From: *slurm-users <slurm-users-boun...@lists.schedmd.com>
        <mailto:slurm-users-boun...@lists.schedmd.com>on behalf of
        Craig <cfre...@super.org> <mailto:cfre...@super.org>
        *Reply-To: *Slurm User Community List
        <slurm-users@lists.schedmd.com>
        <mailto:slurm-users@lists.schedmd.com>
        *Date: *Monday, March 27, 2023 at 12:38 PM
        *To: *"slurm-users@lists.schedmd.com"
        <mailto:slurm-users@lists.schedmd.com><slurm-users@lists.schedmd.com>
        <mailto:slurm-users@lists.schedmd.com>
        *Subject: *[EXTERNAL] [slurm-users] OpenMPI and Slurm
        clarification?


        Can someone please clarify the "best practices" for building
        OpenMPI compatible with Slurm?

        https://slurm.schedmd.com/mpi_guide.html#open_mpi
        
<https://urldefense.com/v3/__https:/slurm.schedmd.com/mpi_guide.html*open_mpi__;Iw!!Bt8fGhp8LhKGRg!Cb86a2IwxgqfT5fv1_eEByDpAyhly3ZdN6Wwl7Wod9FRPx9HBpvFVojIRgu5oSpti_3jOXhNyvJqEMGs$>
        tells me what I _can_ do but I'm unclear as to what I _should_
        do.

        I've built OpenMPI 4.1.5 with:   --with-pmix
        --with-libevent=internal  --with-hwloc=internal --with-slurm. 
        If I run an MPI program on my cluster (slurm 18.08.8) with
        "srun -N2 foo" it seems to work fine. (slurm.conf has
        MpiDefault=pmix).

        If I "srun --mpi=openmpi -N2 foo" it chokes with:

            OPAL_ERROR: Unreachable in file
            ../../../../../opal/mca/pmix/pmix3/pmix3x_client.c at line 112
            
-------------------------------------------------------------------------------------------------------------------
            This application appears to have been direct launched
            using "srun",
            but OMPI was not build with SLURM's PMI support and
            therefore cannot
            execute.  There are several options for building PMI
            support under
            SLURM, depending upon the SLURM version you are using:

            version 16.05 or later: you can use SLURM's PMIx support. THis
            require that you configure and uild SLURM --with-pmix.
            .
            .
            .


        So I guess the question is, what is the "right" way to build
        OpenMPI with Slurm.  Is the fact that my non-Slurm pmix works
        "correct" or am I just getting lucky that the various software
        I have just happens to be compatible.  If I build OpenMPI am I
        supposed to use Slurm's pmix/libevent/hwloc or is that
        optional.  If it's optional when/why might I choose to do so. 
        If I need Slurm's versions is there some way to find which
        pmix/libevent/hwloc my current Slurm install is using? Note:
        my sysadmins are not going to be helpful as they think Slurm
        18 and OpenMPI 4.0.2a is adequate for users' needs :^(.

        I like the idea of _not_ tying my OpenMPI to the installed
        Slurm just in case our support people ever decide to upgrade
        system software.

        Thanks.

Reply via email to