Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread Bennet Fauber
Hi, Artem,

Thanks for the reply.  I'll answer a couple of questions inline below.

One odd thing that I see in the error output that you have provided is that
pmix2x_client.c is active.

> Looking into the v3.1.x branch (
> https://github.com/open-mpi/ompi/tree/v3.1.x/opal/mca/pmix) I see the
> following components:
> * ext1x
> * ext2x
> ...
> *pmix2x
>
> Pmix2x_client is in internal pmix2x component that shouldn't be built if
> external ext2x component was configured. At least it was the case before.
> According to the output it fails on PMIx_Init().
> Can you please do "$ ls mca_pmix_*" in the /lib/openmpi
> directory?
>

$ ls mca_pmix*
mca_pmix_flux.la  mca_pmix_isolated.la  mca_pmix_pmix2x.la
mca_pmix_flux.so  mca_pmix_isolated.so  mca_pmix_pmix2x.so

Another thing that caught my eye: you say that OMPI searches of PMIx 3.x:
> ...
> > It fails on the test for PMIx 3, which is expected, but then
> > reports
> >
> >
> > configure:12843: checking version 2x
> > configure:12861: gcc -E -I/opt/pmix/2.0.2/include  conftest.c
> > configure:12861: $? = 0
> > configure:12862: result: found
> >
>
> But OMPI v3.1.x doesn't have such a component. Can you provide the related
> lines from config.log?
>


Here are the relevant lines.

configure:12680: checking if user requested external PMIx
support(/opt/pmix/2.0.2)
configure:12690: result: yes
configure:12701: checking --with-external-pmix value
configure:12725: result: sanity check ok (/opt/pmix/2.0.2/include)
configure:12768: checking libpmix.* in /opt/pmix/2.0.2/lib64
configure:12774: checking libpmix.* in /opt/pmix/2.0.2/lib
configure:12794: checking PMIx version
configure:12804: result: version file found
configure:12812: checking version 3x
configure:12830: gcc -E -I/opt/pmix/2.0.2/include  conftest.c
conftest.c:95:56: error: #error "not version 3"

I believe that is a red herring.  Some time in the past, I was told that
there is an anticipatory test for pmix3, and since there isn't such a
thing, this is expected to fail.



> Now about debugging of what is happening:
> 1. I'd like to see results with PMIx debug on:
> $ env PMIX_DEBUG=100 srun --mpi=pmix_v2 ...
>

Here is that output, which seems little changed from what was before.  I
include only that from the first communicator, as it repeats almost
verbatim for the others.

srun: Step created for job 99
[cav02.arc-ts.umich.edu:41373] psec: native init
[cav02.arc-ts.umich.edu:41373] psec: none init
[cav02.arc-ts.umich.edu:41374] psec: native init
[cav02.arc-ts.umich.edu:41374] psec: none init
[cav02.arc-ts.umich.edu:41373] pmix: init called
[cav02.arc-ts.umich.edu:41373] PMIX ERROR: OUT-OF-RESOURCE in file
client/pmix_client.c at line 234
[cav02.arc-ts.umich.edu:41373] OPAL ERROR: Error in file pmix2x_client.c at
line 109
[cav02.arc-ts.umich.edu:41374] pmix: init called
[cav02.arc-ts.umich.edu:41374] PMIX ERROR: OUT-OF-RESOURCE in file
client/pmix_client.c at line 234
[cav02.arc-ts.umich.edu:41374] OPAL ERROR: Error in file pmix2x_client.c at
line 109
--
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
--


The second through fourth also have a line about

[cav02.arc-ts.umich.edu:41373] Local abort before MPI_INIT completed
completed successfully, but am not able to aggregate error messages, and
not able to guarantee that all other processes were killed!


> 2. Can you set SlurmdDebug option in slurm.conf to 10, run the test and
> provide the content of slurmd.log?
>

I will reply separately with this, as I have to coordinate with the cluster
administrator, who is not in yet.

Please note, also, that I was able to build this successfully after install
the hwlock-devel package and adding the --disable-dlopen and
--enable-shared options to configure.

Thanks,-- bennet



>
> Today's Topics:
>
>1. Re: Fwd: OpenMPI 3.1.0 on aarch64 (r...@open-mpi.org)
>
> --
>
> Message: 1
> Date: Thu, 7 Jun 2018 08:05:30 -0700
> From: "r...@open-mpi.org" 

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread Bennet Fauber
Artem,

Please find attached the gzipped slurmd.log with the entries from the
failed job's run.

-- bennet




On Fri, Jun 8, 2018 at 7:53 AM Bennet Fauber  wrote:

> Hi, Artem,
>
> Thanks for the reply.  I'll answer a couple of questions inline below.
>
> One odd thing that I see in the error output that you have provided is
> that pmix2x_client.c is active.
>
>> Looking into the v3.1.x branch (
>> https://github.com/open-mpi/ompi/tree/v3.1.x/opal/mca/pmix) I see the
>> following components:
>> * ext1x
>> * ext2x
>> ...
>> *pmix2x
>>
>> Pmix2x_client is in internal pmix2x component that shouldn't be built if
>> external ext2x component was configured. At least it was the case before.
>> According to the output it fails on PMIx_Init().
>> Can you please do "$ ls mca_pmix_*" in the /lib/openmpi
>> directory?
>>
>
> $ ls mca_pmix*
> mca_pmix_flux.la  mca_pmix_isolated.la  mca_pmix_pmix2x.la
> mca_pmix_flux.so  mca_pmix_isolated.so  mca_pmix_pmix2x.so
>
> Another thing that caught my eye: you say that OMPI searches of PMIx 3.x:
>> ...
>> > It fails on the test for PMIx 3, which is expected, but then
>> > reports
>> >
>> >
>> > configure:12843: checking version 2x
>> > configure:12861: gcc -E -I/opt/pmix/2.0.2/include  conftest.c
>> > configure:12861: $? = 0
>> > configure:12862: result: found
>> >
>>
>> But OMPI v3.1.x doesn't have such a component. Can you provide the
>> related lines from config.log?
>>
>
>
> Here are the relevant lines.
>
> configure:12680: checking if user requested external PMIx
> support(/opt/pmix/2.0.2)
> configure:12690: result: yes
> configure:12701: checking --with-external-pmix value
> configure:12725: result: sanity check ok (/opt/pmix/2.0.2/include)
> configure:12768: checking libpmix.* in /opt/pmix/2.0.2/lib64
> configure:12774: checking libpmix.* in /opt/pmix/2.0.2/lib
> configure:12794: checking PMIx version
> configure:12804: result: version file found
> configure:12812: checking version 3x
> configure:12830: gcc -E -I/opt/pmix/2.0.2/include  conftest.c
> conftest.c:95:56: error: #error "not version 3"
>
> I believe that is a red herring.  Some time in the past, I was told that
> there is an anticipatory test for pmix3, and since there isn't such a
> thing, this is expected to fail.
>
>
>
>> Now about debugging of what is happening:
>> 1. I'd like to see results with PMIx debug on:
>> $ env PMIX_DEBUG=100 srun --mpi=pmix_v2 ...
>>
>
> Here is that output, which seems little changed from what was before.  I
> include only that from the first communicator, as it repeats almost
> verbatim for the others.
>
> srun: Step created for job 99
> [cav02.arc-ts.umich.edu:41373] psec: native init
> [cav02.arc-ts.umich.edu:41373] psec: none init
> [cav02.arc-ts.umich.edu:41374] psec: native init
> [cav02.arc-ts.umich.edu:41374] psec: none init
> [cav02.arc-ts.umich.edu:41373] pmix: init called
> [cav02.arc-ts.umich.edu:41373] PMIX ERROR: OUT-OF-RESOURCE in file
> client/pmix_client.c at line 234
> [cav02.arc-ts.umich.edu:41373] OPAL ERROR: Error in file pmix2x_client.c
> at line 109
> [cav02.arc-ts.umich.edu:41374] pmix: init called
> [cav02.arc-ts.umich.edu:41374] PMIX ERROR: OUT-OF-RESOURCE in file
> client/pmix_client.c at line 234
> [cav02.arc-ts.umich.edu:41374] OPAL ERROR: Error in file pmix2x_client.c
> at line 109
> --
> The application appears to have been direct launched using "srun",
> but OMPI was not built with SLURM's PMI support and therefore cannot
> execute. There are several options for building PMI support under
> SLURM, depending upon the SLURM version you are using:
>
>   version 16.05 or later: you can use SLURM's PMIx support. This
>   requires that you configure and build SLURM --with-pmix.
>
>   Versions earlier than 16.05: you must use either SLURM's PMI-1 or
>   PMI-2 support. SLURM builds PMI-1 by default, or you can manually
>   install PMI-2. You must then build Open MPI using --with-pmi pointing
>   to the SLURM PMI library location.
>
> Please configure as appropriate and try again.
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> --
>
>
> The second through fourth also have a line about
>
> [cav02.arc-ts.umich.edu:41373] Local abort before MPI_INIT completed
> completed successfully, but am not able to aggregate error messages, and
> not able to guarantee that all other processes were killed!
>
>
>> 2. Can you set SlurmdDebug option in slurm.conf to 10, run the test and
>> provide the content of slurmd.log?
>>
>
> I will reply separately with this, as I have to coordinate with the
> cluster administrator, who is not in yet.
>
> Please note, also, that I was able to build this successfully after
> install the hwlo

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread Bennet Fauber
Further testing shows that it was the failure to find the hwloc-devel files
that seems to be the cause of the failure.  I compiled and ran without the
additional configure flags, and it still seems to work.

I think it issued a two-line warning about this.  Is that something that
should result in an error if --with-hwloc=external is specified but not
found?  Just a thought.

My immediate problem is solved. Thanks very much Ralph and Artem for your
help!

-- bennet


On Thu, Jun 7, 2018 at 11:06 AM r...@open-mpi.org  wrote:

> Odd - Artem, do you have any suggestions?
>
> > On Jun 7, 2018, at 7:41 AM, Bennet Fauber  wrote:
> >
> > Thanks, Ralph,
> >
> > I just tried it with
> >
> >srun --mpi=pmix_v2 ./test_mpi
> >
> > and got these messages
> >
> >
> > srun: Step created for job 89
> > [cav02.arc-ts.umich.edu:92286] PMIX ERROR: OUT-OF-RESOURCE in file
> > client/pmix_client.c at line 234
> > [cav02.arc-ts.umich.edu:92286] OPAL ERROR: Error in file
> > pmix2x_client.c at line 109
> > [cav02.arc-ts.umich.edu:92287] PMIX ERROR: OUT-OF-RESOURCE in file
> > client/pmix_client.c at line 234
> > [cav02.arc-ts.umich.edu:92287] OPAL ERROR: Error in file
> > pmix2x_client.c at line 109
> >
> --
> > The application appears to have been direct launched using "srun",
> > but OMPI was not built with SLURM's PMI support and therefore cannot
> > execute. There are several options for building PMI support under
> > SLURM, depending upon the SLURM version you are using:
> >
> >  version 16.05 or later: you can use SLURM's PMIx support. This
> >  requires that you configure and build SLURM --with-pmix.
> >
> >  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
> >  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
> >  install PMI-2. You must then build Open MPI using --with-pmi pointing
> >  to the SLURM PMI library location.
> >
> > Please configure as appropriate and try again.
> >
> --
> >
> >
> > Just to be complete, I checked the library path,
> >
> >
> > $ ldconfig -p | egrep 'slurm|pmix'
> >libpmi2.so.1 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi2.so.1
> >libpmi2.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi2.so
> >libpmix.so.2 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmix.so.2
> >libpmix.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmix.so
> >libpmi.so.1 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi.so.1
> >libpmi.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi.so
> >
> >
> > and libpmi* does appear there.
> >
> >
> > I also tried explicitly listing the slurm directory from the slurm
> > library installation in LD_LIBRARY_PATH, just in case it wasn't
> > traversing correctly.  that is, both
> >
> > $ echo $LD_LIBRARY_PATH
> >
> /sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/sw/arcts/centos7/hpc-utils/lib
> >
> > and
> >
> > $ echo $LD_LIBRARY_PATH
> >
> /opt/slurm/lib64/slurm:/opt/pmix/2.0.2/lib:/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/sw/arcts/centos7/hpc-utils/lib
> >
> >
> > I don't have a saved build log, but I can rebuild this and save the
> > build logs, in case any information in those logs would help.
> >
> > I will also mention that we have, in the past, used the
> > --disable-dlopen and --enable-shared flags, which we did not use here.
> > Just in case that makes any difference.
> >
> > -- bennet
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jun 7, 2018 at 10:01 AM, r...@open-mpi.org 
> wrote:
> >> I think you need to set your MPIDefault to pmix_v2 since you are using
> a PMIx v2 library
> >>
> >>
> >>> On Jun 7, 2018, at 6:25 AM, Bennet Fauber  wrote:
> >>>
> >>> Hi, Ralph,
> >>>
> >>> Thanks for the reply, and sorry for the missing information.  I hope
> >>> this fills in the picture better.
> >>>
> >>> $ srun --version
> >>> slurm 17.11.7
> >>>
> >>> $ srun --mpi=list
> >>> srun: MPI types are...
> >>> srun: pmix_v2
> >>> srun: openmpi
> >>> srun: none
> >>> srun: pmi2
> >>> srun: pmix
> >>>
> >>> We have pmix configured as the default in /opt/slurm/etc/slurm.conf
> >>>
> >>>   MpiDefault=pmix
> >>>
> >>> and on the x86_64 system configured the same way, a bare 'srun
> >>> ./test_mpi' is sufficient and runs.
> >>>
> >>> I have tried all of the following srun variations with no joy
> >>>
> >>>
> >>> srun ./test_mpi
> >>> srun --mpi=pmix ./test_mpi
> >>> srun --mpi=pmi2 ./test_mpi
> >>> srun --mpi=openmpi ./test_mpi
> >>>
> >>>
> >>> I believe we are using the spec files that come with both pmix and
> >>> with slurm, and the following to build the .rpm files used at
> >>> installation
> >>>
> >>>
> >>> $ rpmbuild --define '_pr

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread r...@open-mpi.org


> On Jun 8, 2018, at 8:10 AM, Bennet Fauber  wrote:
> 
> Further testing shows that it was the failure to find the hwloc-devel files 
> that seems to be the cause of the failure.  I compiled and ran without the 
> additional configure flags, and it still seems to work.
> 
> I think it issued a two-line warning about this.  Is that something that 
> should result in an error if --with-hwloc=external is specified but not 
> found?  Just a thought.

Yes - that is a bug in our configury. It should have immediately error’d out.

> 
> My immediate problem is solved. Thanks very much Ralph and Artem for your 
> help!
> 
> -- bennet
> 
> 
> On Thu, Jun 7, 2018 at 11:06 AM r...@open-mpi.org  
> mailto:r...@open-mpi.org>> wrote:
> Odd - Artem, do you have any suggestions?
> 
> > On Jun 7, 2018, at 7:41 AM, Bennet Fauber  > > wrote:
> > 
> > Thanks, Ralph,
> > 
> > I just tried it with
> > 
> >srun --mpi=pmix_v2 ./test_mpi
> > 
> > and got these messages
> > 
> > 
> > srun: Step created for job 89
> > [cav02.arc-ts.umich.edu:92286 ] PMIX 
> > ERROR: OUT-OF-RESOURCE in file
> > client/pmix_client.c at line 234
> > [cav02.arc-ts.umich.edu:92286 ] OPAL 
> > ERROR: Error in file
> > pmix2x_client.c at line 109
> > [cav02.arc-ts.umich.edu:92287 ] PMIX 
> > ERROR: OUT-OF-RESOURCE in file
> > client/pmix_client.c at line 234
> > [cav02.arc-ts.umich.edu:92287 ] OPAL 
> > ERROR: Error in file
> > pmix2x_client.c at line 109
> > --
> > The application appears to have been direct launched using "srun",
> > but OMPI was not built with SLURM's PMI support and therefore cannot
> > execute. There are several options for building PMI support under
> > SLURM, depending upon the SLURM version you are using:
> > 
> >  version 16.05 or later: you can use SLURM's PMIx support. This
> >  requires that you configure and build SLURM --with-pmix.
> > 
> >  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
> >  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
> >  install PMI-2. You must then build Open MPI using --with-pmi pointing
> >  to the SLURM PMI library location.
> > 
> > Please configure as appropriate and try again.
> > --
> > 
> > 
> > Just to be complete, I checked the library path,
> > 
> > 
> > $ ldconfig -p | egrep 'slurm|pmix'
> >libpmi2.so.1 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi2.so.1
> >libpmi2.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi2.so
> >libpmix.so.2 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmix.so.2
> >libpmix.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmix.so
> >libpmi.so.1 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi.so.1
> >libpmi.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi.so
> > 
> > 
> > and libpmi* does appear there.
> > 
> > 
> > I also tried explicitly listing the slurm directory from the slurm
> > library installation in LD_LIBRARY_PATH, just in case it wasn't
> > traversing correctly.  that is, both
> > 
> > $ echo $LD_LIBRARY_PATH
> > /sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/sw/arcts/centos7/hpc-utils/lib
> > 
> > and
> > 
> > $ echo $LD_LIBRARY_PATH
> > /opt/slurm/lib64/slurm:/opt/pmix/2.0.2/lib:/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/sw/arcts/centos7/hpc-utils/lib
> > 
> > 
> > I don't have a saved build log, but I can rebuild this and save the
> > build logs, in case any information in those logs would help.
> > 
> > I will also mention that we have, in the past, used the
> > --disable-dlopen and --enable-shared flags, which we did not use here.
> > Just in case that makes any difference.
> > 
> > -- bennet
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Thu, Jun 7, 2018 at 10:01 AM, r...@open-mpi.org 
> >  mailto:r...@open-mpi.org>> 
> > wrote:
> >> I think you need to set your MPIDefault to pmix_v2 since you are using a 
> >> PMIx v2 library
> >> 
> >> 
> >>> On Jun 7, 2018, at 6:25 AM, Bennet Fauber  >>> > wrote:
> >>> 
> >>> Hi, Ralph,
> >>> 
> >>> Thanks for the reply, and sorry for the missing information.  I hope
> >>> this fills in the picture better.
> >>> 
> >>> $ srun --version
> >>> slurm 17.11.7
> >>> 
> >>> $ srun --mpi=list
> >>> srun: MPI types are...
> >>> srun: pmix_v2
> >>> srun: openmpi
> >>> srun: none
> >>> srun: pmi2
> >>> srun: pmix
> >>> 
> >>> We have pmix configured as the default in /opt/slurm/etc/slurm.conf
> >>> 
> >>>   M

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread Jeff Squyres (jsquyres) via users
Hmm.  I'm confused -- can we clarify?

I just tried configuring Open MPI v3.1.0 on a RHEL 7.4 system with the RHEL 
hwloc RPM installed, but *not* the hwloc-devel RPM.  Hence, no hwloc.h (for 
example).

When specifying an external hwloc, configure did fail, as expected:

-
$ ./configure --with-hwloc=external ...
...

+++ Configuring MCA framework hwloc
checking for no configure components in framework hwloc... 
checking for m4 configure components in framework hwloc... external, hwloc1117

--- MCA component hwloc:external (m4 configuration macro, priority 90)
checking for MCA component hwloc:external compile mode... static
checking --with-hwloc-libdir value... simple ok (unspecified value)
checking looking for external hwloc in... (default search paths)
checking hwloc.h usability... no
checking hwloc.h presence... no
checking for hwloc.h... no
checking if MCA component hwloc:external can compile... no
configure: WARNING: MCA component "external" failed to configure properly
configure: WARNING: This component was selected as the default
configure: error: Cannot continue
$
---

Are you seeing something different?



> On Jun 8, 2018, at 11:16 AM, r...@open-mpi.org wrote:
> 
> 
> 
>> On Jun 8, 2018, at 8:10 AM, Bennet Fauber  wrote:
>> 
>> Further testing shows that it was the failure to find the hwloc-devel files 
>> that seems to be the cause of the failure.  I compiled and ran without the 
>> additional configure flags, and it still seems to work.
>> 
>> I think it issued a two-line warning about this.  Is that something that 
>> should result in an error if --with-hwloc=external is specified but not 
>> found?  Just a thought.
> 
> Yes - that is a bug in our configury. It should have immediately error’d out.
> 
>> 
>> My immediate problem is solved. Thanks very much Ralph and Artem for your 
>> help!
>> 
>> -- bennet
>> 
>> 
>> On Thu, Jun 7, 2018 at 11:06 AM r...@open-mpi.org  wrote:
>> Odd - Artem, do you have any suggestions?
>> 
>> > On Jun 7, 2018, at 7:41 AM, Bennet Fauber  wrote:
>> > 
>> > Thanks, Ralph,
>> > 
>> > I just tried it with
>> > 
>> >srun --mpi=pmix_v2 ./test_mpi
>> > 
>> > and got these messages
>> > 
>> > 
>> > srun: Step created for job 89
>> > [cav02.arc-ts.umich.edu:92286] PMIX ERROR: OUT-OF-RESOURCE in file
>> > client/pmix_client.c at line 234
>> > [cav02.arc-ts.umich.edu:92286] OPAL ERROR: Error in file
>> > pmix2x_client.c at line 109
>> > [cav02.arc-ts.umich.edu:92287] PMIX ERROR: OUT-OF-RESOURCE in file
>> > client/pmix_client.c at line 234
>> > [cav02.arc-ts.umich.edu:92287] OPAL ERROR: Error in file
>> > pmix2x_client.c at line 109
>> > --
>> > The application appears to have been direct launched using "srun",
>> > but OMPI was not built with SLURM's PMI support and therefore cannot
>> > execute. There are several options for building PMI support under
>> > SLURM, depending upon the SLURM version you are using:
>> > 
>> >  version 16.05 or later: you can use SLURM's PMIx support. This
>> >  requires that you configure and build SLURM --with-pmix.
>> > 
>> >  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
>> >  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
>> >  install PMI-2. You must then build Open MPI using --with-pmi pointing
>> >  to the SLURM PMI library location.
>> > 
>> > Please configure as appropriate and try again.
>> > --
>> > 
>> > 
>> > Just to be complete, I checked the library path,
>> > 
>> > 
>> > $ ldconfig -p | egrep 'slurm|pmix'
>> >libpmi2.so.1 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi2.so.1
>> >libpmi2.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi2.so
>> >libpmix.so.2 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmix.so.2
>> >libpmix.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmix.so
>> >libpmi.so.1 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi.so.1
>> >libpmi.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi.so
>> > 
>> > 
>> > and libpmi* does appear there.
>> > 
>> > 
>> > I also tried explicitly listing the slurm directory from the slurm
>> > library installation in LD_LIBRARY_PATH, just in case it wasn't
>> > traversing correctly.  that is, both
>> > 
>> > $ echo $LD_LIBRARY_PATH
>> > /sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/sw/arcts/centos7/hpc-utils/lib
>> > 
>> > and
>> > 
>> > $ echo $LD_LIBRARY_PATH
>> > /opt/slurm/lib64/slurm:/opt/pmix/2.0.2/lib:/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/sw/arcts/centos7/hpc-utils/lib
>> > 
>> > 
>> > I don't have a saved build log, but I can rebuild this and save the
>> > build logs, in case any informati

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread Bennet Fauber
Jeff,

Hmm.  Maybe I had insufficient error checking in our installation process.

Can you make and make install after the configure fails?  I somehow got an
installation, despite the configure status, perhaps?

-- bennet




On Fri, Jun 8, 2018 at 11:32 AM Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:

> Hmm.  I'm confused -- can we clarify?
>
> I just tried configuring Open MPI v3.1.0 on a RHEL 7.4 system with the
> RHEL hwloc RPM installed, but *not* the hwloc-devel RPM.  Hence, no hwloc.h
> (for example).
>
> When specifying an external hwloc, configure did fail, as expected:
>
> -
> $ ./configure --with-hwloc=external ...
> ...
>
> +++ Configuring MCA framework hwloc
> checking for no configure components in framework hwloc...
> checking for m4 configure components in framework hwloc... external,
> hwloc1117
>
> --- MCA component hwloc:external (m4 configuration macro, priority 90)
> checking for MCA component hwloc:external compile mode... static
> checking --with-hwloc-libdir value... simple ok (unspecified value)
> checking looking for external hwloc in... (default search paths)
> checking hwloc.h usability... no
> checking hwloc.h presence... no
> checking for hwloc.h... no
> checking if MCA component hwloc:external can compile... no
> configure: WARNING: MCA component "external" failed to configure properly
> configure: WARNING: This component was selected as the default
> configure: error: Cannot continue
> $
> ---
>
> Are you seeing something different?
>
>
>
> > On Jun 8, 2018, at 11:16 AM, r...@open-mpi.org wrote:
> >
> >
> >
> >> On Jun 8, 2018, at 8:10 AM, Bennet Fauber  wrote:
> >>
> >> Further testing shows that it was the failure to find the hwloc-devel
> files that seems to be the cause of the failure.  I compiled and ran
> without the additional configure flags, and it still seems to work.
> >>
> >> I think it issued a two-line warning about this.  Is that something
> that should result in an error if --with-hwloc=external is specified but
> not found?  Just a thought.
> >
> > Yes - that is a bug in our configury. It should have immediately error’d
> out.
> >
> >>
> >> My immediate problem is solved. Thanks very much Ralph and Artem for
> your help!
> >>
> >> -- bennet
> >>
> >>
> >> On Thu, Jun 7, 2018 at 11:06 AM r...@open-mpi.org 
> wrote:
> >> Odd - Artem, do you have any suggestions?
> >>
> >> > On Jun 7, 2018, at 7:41 AM, Bennet Fauber  wrote:
> >> >
> >> > Thanks, Ralph,
> >> >
> >> > I just tried it with
> >> >
> >> >srun --mpi=pmix_v2 ./test_mpi
> >> >
> >> > and got these messages
> >> >
> >> >
> >> > srun: Step created for job 89
> >> > [cav02.arc-ts.umich.edu:92286] PMIX ERROR: OUT-OF-RESOURCE in file
> >> > client/pmix_client.c at line 234
> >> > [cav02.arc-ts.umich.edu:92286] OPAL ERROR: Error in file
> >> > pmix2x_client.c at line 109
> >> > [cav02.arc-ts.umich.edu:92287] PMIX ERROR: OUT-OF-RESOURCE in file
> >> > client/pmix_client.c at line 234
> >> > [cav02.arc-ts.umich.edu:92287] OPAL ERROR: Error in file
> >> > pmix2x_client.c at line 109
> >> >
> --
> >> > The application appears to have been direct launched using "srun",
> >> > but OMPI was not built with SLURM's PMI support and therefore cannot
> >> > execute. There are several options for building PMI support under
> >> > SLURM, depending upon the SLURM version you are using:
> >> >
> >> >  version 16.05 or later: you can use SLURM's PMIx support. This
> >> >  requires that you configure and build SLURM --with-pmix.
> >> >
> >> >  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
> >> >  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
> >> >  install PMI-2. You must then build Open MPI using --with-pmi pointing
> >> >  to the SLURM PMI library location.
> >> >
> >> > Please configure as appropriate and try again.
> >> >
> --
> >> >
> >> >
> >> > Just to be complete, I checked the library path,
> >> >
> >> >
> >> > $ ldconfig -p | egrep 'slurm|pmix'
> >> >libpmi2.so.1 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi2.so.1
> >> >libpmi2.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi2.so
> >> >libpmix.so.2 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmix.so.2
> >> >libpmix.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmix.so
> >> >libpmi.so.1 (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi.so.1
> >> >libpmi.so (libc6,AArch64) => /opt/pmix/2.0.2/lib/libpmi.so
> >> >
> >> >
> >> > and libpmi* does appear there.
> >> >
> >> >
> >> > I also tried explicitly listing the slurm directory from the slurm
> >> > library installation in LD_LIBRARY_PATH, just in case it wasn't
> >> > traversing correctly.  that is, both
> >> >
> >> > $ echo $LD_LIBRARY_PATH
> >> >
> /sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArc

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread Jeff Squyres (jsquyres) via users
On Jun 8, 2018, at 11:38 AM, Bennet Fauber  wrote:
> 
> Hmm.  Maybe I had insufficient error checking in our installation process.
> 
> Can you make and make install after the configure fails?  I somehow got an 
> installation, despite the configure status, perhaps?

If it's a fresh tarball expansion that you've never built before, no (because 
there will be no Makefiles, etc.).

If you've previously built that tree before, then configure may fail, but you 
can still run "make clean all install" because the stale Makefiles (etc.) are 
still around.

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users