[OMPI users] Error handling

2023-07-18 Thread Alexander Stadik via users
Hey everyone,

I am working for longer time now with cuda-aware OpenMPI, and developed longer 
time back a small exceptions handling framework including MPI and CUDA 
exceptions.
Currently I am using MPI_Abort with costum error numbers, to terminate 
everything elegantly, which works well, by just reading the logfile in case of 
a crash.

Now I was wondering how one can handle return / exit codes properly between 
processes, since we would like to filter non-zero exits by return code.

One way is a simple Allreduce (in my case) + exit instead of Abort. But the 
problem seems to be the values are always "random" (since I was using negative 
codes), only by using MPI error codes it seems to work correctly.
But usage of that is limited.

Any suggestions on how to do this / how it can work properly?

BR Alex



[https://www.essteyr.com/wp-content/uploads/2020/02/pic-1_1568d80e-78e3-426f-85e8-4bf0051208351.png]


[https://www.essteyr.com/wp-content/uploads/2021/01/ESSSignatur3.png]

[https://www.essteyr.com/wp-content/uploads/2020/02/linkedin_38a91193-02cf-4df9-8e91-230f7459e9c3.png]
 
[https://www.essteyr.com/wp-content/uploads/2020/02/twitter_5fc7318f-c0e4-495c-b96c-ebd9cf186067.png]
   
[https://www.essteyr.com/wp-content/uploads/2020/02/facebook_ee01289e-1a90-48d0-8e82-049bb3c3a46b.png]
   
[https://www.essteyr.com/wp-content/uploads/2020/09/SocialLink_Instagram_32x32_ea55186d-8d0b-4f5e-a023-02e04995f5bf.png]
 

[cid:QR3a6b35cf-f0bb-484c-a686-022d30599571.png]


DI Alexander Stadik

Head of Large Scale Solutions
Research & Development | Large Scale Solutions

[cid:teams_32x32_7ad2335e-d971-4370-9e6d-14fa34f6ab0e.png] Book a 
Meeting

Phone:  +4372522044622
Company: +43725220446

Mail: alexander.sta...@essteyr.com


Register of Firms No.: FN 427703 a
Commercial Court: District Court Steyr
UID: ATU69213102


[https://www.essteyr.com/wp-content/uploads/2018/09/pic-2_f96fc865-57a5-4ef1-a924-add9b85d55cc1.png]


ESS Engineering Software Steyr GmbH * Berggasse 35 * 4400 * Steyr * Austria


[https://www.essteyr.com/wp-content/uploads/2018/09/pic-2_1df6b77f-61f1-40d3-a337-0145e62afb3e1.png]


This message is confidential. It may also be privileged or otherwise protected 
by work product immunity or other legal rules. If you have received it by 
mistake, please let us know by e-mail reply and delete it from your system; you 
may not copy this message or disclose its contents to anyone. Please send us by 
fax any message containing deadlines as incoming e-mails are not screened for 
response deadlines. The integrity and security of this message cannot be 
guaranteed on the Internet.





Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-18 Thread Jeffrey Layton via users
Jeff,

Thanks for the tip - it started me thinking a bit.

I was using a directory in my /home account with 4.1.5 that I had
previously built using GCC 9.4 (Ubuntu 20.04). I rebuilt the system with
Ubuntu-22.04 but I did a backup of /home. Then I copied the 4.1.5 directory
to /home again.

I checked and I did a "make clean" before attempting to build 4.1.5 but
with GCC 11.3 that came with Ubuntu 22.04. In fact, I did it several times
before I ran configure.

Even after running "make clean" I got the error I mentioned in my initial
post. This happened several times.

This morning, I blew away my 4.1.5 directory and downloaded a fresh 4.1.5.
Configure went fine as did compiling it.

My theory is that some cruft from building 4.1.5 with GCC 9.4 compilers
hung around, even after "make clean". Using a "fresh" download of 4.1.5 did
not include this "cruft" so configure and make all proceeds just fine.

I don't know if this is correct and I can't point to any smoking gun though.

Thanks!

Jeff


On Mon, Jul 17, 2023 at 2:53 PM Jeff Squyres (jsquyres) 
wrote:

> That's a little odd.  Usually, the specific .h files that are listed as
> dependencies came from *somewhere* -- usually either part of the GNU
> Autotools dependency analysis.
>
> I'm guessing that /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h doesn't
> actually exist on your system -- but then how did it get into Open MPI's
> makefiles?
>
> Did you run configure on one machine and make on a different machine,
> perchance?
> --
> *From:* users  on behalf of Jeffrey
> Layton via users 
> *Sent:* Monday, July 17, 2023 2:05 PM
> *To:* Open MPI Users 
> *Cc:* Jeffrey Layton 
> *Subject:* [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3
>
> Good afternoon,
>
> I'm trying to build Open MPI 4.1.5 using GCC 11.3. However, I get an error
> that I'm not sure how to correct. The error is,
>
> ...
>   CC   pscatter.lo
>   CC   piscatter.lo
>   CC   pscatterv.lo
>   CC   piscatterv.lo
>   CC   psend.lo
>   CC   psend_init.lo
>   CC   psendrecv.lo
>   CC   psendrecv_replace.lo
>   CC   pssend_init.lo
>   CC   pssend.lo
>   CC   pstart.lo
>   CC   pstartall.lo
>   CC   pstatus_c2f.lo
>   CC   pstatus_f2c.lo
>   CC   pstatus_set_cancelled.lo
>   CC   pstatus_set_elements.lo
>   CC   pstatus_set_elements_x.lo
>   CC   ptestall.lo
>   CC   ptestany.lo
>   CC   ptest.lo
>   CC   ptest_cancelled.lo
>   CC   ptestsome.lo
>   CC   ptopo_test.lo
>   CC   ptype_c2f.lo
>   CC   ptype_commit.lo
>   CC   ptype_contiguous.lo
>   CC   ptype_create_darray.lo
> make[3]: *** No rule to make target
> '/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h', needed by
> 'ptype_create_f90_complex.lo'.  Stop.
> make[3]: Leaving directory
> '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c/profile'
> make[2]: *** [Makefile:2559: all-recursive] Error 1
> make[2]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c'
> make[1]: *** [Makefile:3566: all-recursive] Error 1
> make[1]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi'
> make: *** [Makefile:1912: all-recursive] Error 1
>
>
>
> Here is the configuration output from configure:
>
> Open MPI configuration:
> ---
> Version: 4.1.5
> Build MPI C bindings: yes
> Build MPI C++ bindings (deprecated): no
> Build MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
> MPI Build Java bindings (experimental): no
> Build Open SHMEM support: false (no spml)
> Debug build: no
> Platform file: (none)
>
> Miscellaneous
> ---
> CUDA support: no
> HWLOC support: external
> Libevent support: internal
> Open UCC: no
> PMIx support: Internal
>
> Transports
> ---
> Cisco usNIC: no
> Cray uGNI (Gemini/Aries): no
> Intel Omnipath (PSM2): no
> Intel TrueScale (PSM): no
> Mellanox MXM: no
> Open UCX: no
> OpenFabrics OFI Libfabric: no
> OpenFabrics Verbs: no
> Portals4: no
> Shared memory/copy in+copy out: yes
> Shared memory/Linux CMA: yes
> Shared memory/Linux KNEM: no
> Shared memory/XPMEM: no
> TCP: yes
>
> Resource Managers
> ---
> Cray Alps: no
> Grid Engine: no
> LSF: no
> Moab: no
> Slurm: yes
> ssh/rsh: yes
> Torque: no
>
> OMPIO File Systems
> ---
> DDN Infinite Memory Engine: no
> Generic Unix FS: yes
> IBM Spectrum Scale/GPFS: no
> Lustre: no
> PVFS2/OrangeFS: no
>
>
>
> Any suggestions! Thanks!
>
> Jeff
>
>
>
>


Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-18 Thread Jeff Squyres (jsquyres) via users
There were probably quite a few differences from the output of "configure" 
between GCC 9.4 and GCC 11.3.

For example, your original post cited 
"/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h", which, I assume, does not 
exist on your new GCC 11.3-based system.

Meaning: if you had run make clean and then re-ran configure, it probably would 
have built ok.  But deleting the whole source tree and re-configuring + 
re-building also works.  🙂

From: Jeffrey Layton 
Sent: Tuesday, July 18, 2023 11:38 AM
To: Jeff Squyres (jsquyres) 
Cc: Open MPI Users 
Subject: Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

Jeff,

Thanks for the tip - it started me thinking a bit.

I was using a directory in my /home account with 4.1.5 that I had previously 
built using GCC 9.4 (Ubuntu 20.04). I rebuilt the system with Ubuntu-22.04 but 
I did a backup of /home. Then I copied the 4.1.5 directory to /home again.

I checked and I did a "make clean" before attempting to build 4.1.5 but with 
GCC 11.3 that came with Ubuntu 22.04. In fact, I did it several times before I 
ran configure.

Even after running "make clean" I got the error I mentioned in my initial post. 
This happened several times.

This morning, I blew away my 4.1.5 directory and downloaded a fresh 4.1.5. 
Configure went fine as did compiling it.

My theory is that some cruft from building 4.1.5 with GCC 9.4 compilers hung 
around, even after "make clean". Using a "fresh" download of 4.1.5 did not 
include this "cruft" so configure and make all proceeds just fine.

I don't know if this is correct and I can't point to any smoking gun though.

Thanks!

Jeff


On Mon, Jul 17, 2023 at 2:53 PM Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
That's a little odd.  Usually, the specific .h files that are listed as 
dependencies came from somewhere -- usually either part of the GNU Autotools 
dependency analysis.

I'm guessing that /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h doesn't 
actually exist on your system -- but then how did it get into Open MPI's 
makefiles?

Did you run configure on one machine and make on a different machine, perchance?

From: users 
mailto:users-boun...@lists.open-mpi.org>> on 
behalf of Jeffrey Layton via users 
mailto:users@lists.open-mpi.org>>
Sent: Monday, July 17, 2023 2:05 PM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Jeffrey Layton mailto:layto...@gmail.com>>
Subject: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

Good afternoon,

I'm trying to build Open MPI 4.1.5 using GCC 11.3. However, I get an error that 
I'm not sure how to correct. The error is,

...
  CC   pscatter.lo
  CC   piscatter.lo
  CC   pscatterv.lo
  CC   piscatterv.lo
  CC   psend.lo
  CC   psend_init.lo
  CC   psendrecv.lo
  CC   psendrecv_replace.lo
  CC   pssend_init.lo
  CC   pssend.lo
  CC   pstart.lo
  CC   pstartall.lo
  CC   pstatus_c2f.lo
  CC   pstatus_f2c.lo
  CC   pstatus_set_cancelled.lo
  CC   pstatus_set_elements.lo
  CC   pstatus_set_elements_x.lo
  CC   ptestall.lo
  CC   ptestany.lo
  CC   ptest.lo
  CC   ptest_cancelled.lo
  CC   ptestsome.lo
  CC   ptopo_test.lo
  CC   ptype_c2f.lo
  CC   ptype_commit.lo
  CC   ptype_contiguous.lo
  CC   ptype_create_darray.lo
make[3]: *** No rule to make target 
'/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h', needed by 
'ptype_create_f90_complex.lo'.  Stop.
make[3]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c/profile'
make[2]: *** [Makefile:2559: all-recursive] Error 1
make[2]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c'
make[1]: *** [Makefile:3566: all-recursive] Error 1
make[1]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi'
make: *** [Makefile:1912: all-recursive] Error 1



Here is the configuration output from configure:

Open MPI configuration:
---
Version: 4.1.5
Build MPI C bindings: yes
Build MPI C++ bindings (deprecated): no
Build MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
MPI Build Java bindings (experimental): no
Build Open SHMEM support: false (no spml)
Debug build: no
Platform file: (none)

Miscellaneous
---
CUDA support: no
HWLOC support: external
Libevent support: internal
Open UCC: no
PMIx support: Internal

Transports
---
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
Intel TrueScale (PSM): no
Mellanox MXM: no
Open UCX: no
OpenFabrics OFI Libfabric: no
OpenFabrics Verbs: no
Portals4: no
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes

Resource Managers
---
Cray Alps: no
Grid Engine: no
LSF: no
Moab: no
Slurm: yes
ssh/rsh: yes
Torque: no

OMPIO File Systems
---
DDN Infinite Memory Engine: no
Generic Unix FS: yes
IBM Spectrum 

Re: [OMPI users] Error handling

2023-07-18 Thread George Bosilca via users
Alex,

How are your values "random" if you provide correct values ? Even for
negative values you could use MIN to pick one value and return it. What is
the problem with `MPI_Abort` ? it does seem to do what you want.

  George.


On Tue, Jul 18, 2023 at 4:38 AM Alexander Stadik via users <
users@lists.open-mpi.org> wrote:

> Hey everyone,
>
> I am working for longer time now with cuda-aware OpenMPI, and developed
> longer time back a small exceptions handling framework including MPI and
> CUDA exceptions.
> Currently I am using MPI_Abort with costum error numbers, to terminate
> everything elegantly, which works well, by just reading the logfile in case
> of a crash.
>
> Now I was wondering how one can handle return / exit codes properly
> between processes, since we would like to filter non-zero exits by return
> code.
>
> One way is a simple Allreduce (in my case) + exit instead of Abort. But
> the problem seems to be the values are always "random" (since I was using
> negative codes), only by using MPI error codes it seems to work correctly.
> But usage of that is limited.
>
> Any suggestions on how to do this / how it can work properly?
>
> BR Alex
>
>
> 
>
> 
>   
> 
>
> DI Alexander Stadik
>
> Head of Large Scale Solutions
> Research & Development | Large Scale Solutions
>
> Book a Meeting
> 
>
>
> Phone:  +4372522044622
> Company: +43725220446
>
> Mail: alexander.sta...@essteyr.com
>
>
> Register of Firms No.: FN 427703 a
> Commercial Court: District Court Steyr
> UID: ATU69213102
>
> ESS Engineering Software Steyr GmbH • Berggasse 35 • 4400 • Steyr • Austria
>
> This message is confidential. It may also be privileged or otherwise
> protected by work product immunity or other legal rules. If you have
> received it by mistake, please let us know by e-mail reply and delete it
> from your system; you may not copy this message or disclose its contents to
> anyone. Please send us by fax any message containing deadlines as incoming
> e-mails are not screened for response deadlines. The integrity and security
> of this message cannot be guaranteed on the Internet.
>
> 
>
>


Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-18 Thread Jeffrey Layton via users
As soon as you pointed out /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h
that made me think of the previous build.

I did "make clean" a _bunch_ of times before running configure and it
didn't cure it. Strange.

But, nuking the source tree from orbit, just to be sure, and then
configure/rebuild worked just create!

Thanks!

Jeff


On Tue, Jul 18, 2023 at 12:29 PM Jeff Squyres (jsquyres) 
wrote:

> There were probably quite a few differences from the output of "configure"
> between GCC 9.4 and GCC 11.3.
>
> For example, your original post cited "
> /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h", which, I assume, does
> not exist on your new GCC 11.3-based system.
>
> Meaning: if you had run make clean and then re-ran configure, it probably
> would have built ok.  But deleting the whole source tree and re-configuring
> + re-building also works.  🙂
> --
> *From:* Jeffrey Layton 
> *Sent:* Tuesday, July 18, 2023 11:38 AM
> *To:* Jeff Squyres (jsquyres) 
> *Cc:* Open MPI Users 
> *Subject:* Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3
>
> Jeff,
>
> Thanks for the tip - it started me thinking a bit.
>
> I was using a directory in my /home account with 4.1.5 that I had
> previously built using GCC 9.4 (Ubuntu 20.04). I rebuilt the system with
> Ubuntu-22.04 but I did a backup of /home. Then I copied the 4.1.5 directory
> to /home again.
>
> I checked and I did a "make clean" before attempting to build 4.1.5 but
> with GCC 11.3 that came with Ubuntu 22.04. In fact, I did it several times
> before I ran configure.
>
> Even after running "make clean" I got the error I mentioned in my initial
> post. This happened several times.
>
> This morning, I blew away my 4.1.5 directory and downloaded a fresh 4.1.5.
> Configure went fine as did compiling it.
>
> My theory is that some cruft from building 4.1.5 with GCC 9.4 compilers
> hung around, even after "make clean". Using a "fresh" download of 4.1.5 did
> not include this "cruft" so configure and make all proceeds just fine.
>
> I don't know if this is correct and I can't point to any smoking gun
> though.
>
> Thanks!
>
> Jeff
>
>
> On Mon, Jul 17, 2023 at 2:53 PM Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
> That's a little odd.  Usually, the specific .h files that are listed as
> dependencies came from *somewhere* -- usually either part of the GNU
> Autotools dependency analysis.
>
> I'm guessing that /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h doesn't
> actually exist on your system -- but then how did it get into Open MPI's
> makefiles?
>
> Did you run configure on one machine and make on a different machine,
> perchance?
> --
> *From:* users  on behalf of Jeffrey
> Layton via users 
> *Sent:* Monday, July 17, 2023 2:05 PM
> *To:* Open MPI Users 
> *Cc:* Jeffrey Layton 
> *Subject:* [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3
>
> Good afternoon,
>
> I'm trying to build Open MPI 4.1.5 using GCC 11.3. However, I get an error
> that I'm not sure how to correct. The error is,
>
> ...
>   CC   pscatter.lo
>   CC   piscatter.lo
>   CC   pscatterv.lo
>   CC   piscatterv.lo
>   CC   psend.lo
>   CC   psend_init.lo
>   CC   psendrecv.lo
>   CC   psendrecv_replace.lo
>   CC   pssend_init.lo
>   CC   pssend.lo
>   CC   pstart.lo
>   CC   pstartall.lo
>   CC   pstatus_c2f.lo
>   CC   pstatus_f2c.lo
>   CC   pstatus_set_cancelled.lo
>   CC   pstatus_set_elements.lo
>   CC   pstatus_set_elements_x.lo
>   CC   ptestall.lo
>   CC   ptestany.lo
>   CC   ptest.lo
>   CC   ptest_cancelled.lo
>   CC   ptestsome.lo
>   CC   ptopo_test.lo
>   CC   ptype_c2f.lo
>   CC   ptype_commit.lo
>   CC   ptype_contiguous.lo
>   CC   ptype_create_darray.lo
> make[3]: *** No rule to make target
> '/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h', needed by
> 'ptype_create_f90_complex.lo'.  Stop.
> make[3]: Leaving directory
> '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c/profile'
> make[2]: *** [Makefile:2559: all-recursive] Error 1
> make[2]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c'
> make[1]: *** [Makefile:3566: all-recursive] Error 1
> make[1]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi'
> make: *** [Makefile:1912: all-recursive] Error 1
>
>
>
> Here is the configuration output from configure:
>
> Open MPI configuration:
> ---
> Version: 4.1.5
> Build MPI C bindings: yes
> Build MPI C++ bindings (deprecated): no
> Build MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
> MPI Build Java bindings (experimental): no
> Build Open SHMEM support: false (no spml)
> Debug build: no
> Platform file: (none)
>
> Miscellaneous
> ---
> CUDA support: no
> HWLOC support: external
> Libevent support: internal
> Open UCC: no
> PMIx support: Internal
>
> Transports
> ---
> Cisco usNIC: no
> Cray uGNI (Gemini/Aries): 

Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-18 Thread Jeff Squyres (jsquyres) via users
The GNU-generated Makefile dependencies may not be removed during "make clean" 
-- they may only be removed during "make distclean" (which is kinda equivalent 
to rm -rf'ing the tree and extracting a fresh tarball).

From: Jeffrey Layton 
Sent: Tuesday, July 18, 2023 12:51 PM
To: Jeff Squyres (jsquyres) 
Cc: Open MPI Users 
Subject: Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

As soon as you pointed out /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h  
that made me think of the previous build.

I did "make clean" a _bunch_ of times before running configure and it didn't 
cure it. Strange.

But, nuking the source tree from orbit, just to be sure, and then 
configure/rebuild worked just create!

Thanks!

Jeff


On Tue, Jul 18, 2023 at 12:29 PM Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
There were probably quite a few differences from the output of "configure" 
between GCC 9.4 and GCC 11.3.

For example, your original post cited 
"/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h", which, I assume, does not 
exist on your new GCC 11.3-based system.

Meaning: if you had run make clean and then re-ran configure, it probably would 
have built ok.  But deleting the whole source tree and re-configuring + 
re-building also works.  🙂

From: Jeffrey Layton mailto:layto...@gmail.com>>
Sent: Tuesday, July 18, 2023 11:38 AM
To: Jeff Squyres (jsquyres) mailto:jsquy...@cisco.com>>
Cc: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

Jeff,

Thanks for the tip - it started me thinking a bit.

I was using a directory in my /home account with 4.1.5 that I had previously 
built using GCC 9.4 (Ubuntu 20.04). I rebuilt the system with Ubuntu-22.04 but 
I did a backup of /home. Then I copied the 4.1.5 directory to /home again.

I checked and I did a "make clean" before attempting to build 4.1.5 but with 
GCC 11.3 that came with Ubuntu 22.04. In fact, I did it several times before I 
ran configure.

Even after running "make clean" I got the error I mentioned in my initial post. 
This happened several times.

This morning, I blew away my 4.1.5 directory and downloaded a fresh 4.1.5. 
Configure went fine as did compiling it.

My theory is that some cruft from building 4.1.5 with GCC 9.4 compilers hung 
around, even after "make clean". Using a "fresh" download of 4.1.5 did not 
include this "cruft" so configure and make all proceeds just fine.

I don't know if this is correct and I can't point to any smoking gun though.

Thanks!

Jeff


On Mon, Jul 17, 2023 at 2:53 PM Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
That's a little odd.  Usually, the specific .h files that are listed as 
dependencies came from somewhere -- usually either part of the GNU Autotools 
dependency analysis.

I'm guessing that /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h doesn't 
actually exist on your system -- but then how did it get into Open MPI's 
makefiles?

Did you run configure on one machine and make on a different machine, perchance?

From: users 
mailto:users-boun...@lists.open-mpi.org>> on 
behalf of Jeffrey Layton via users 
mailto:users@lists.open-mpi.org>>
Sent: Monday, July 17, 2023 2:05 PM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Jeffrey Layton mailto:layto...@gmail.com>>
Subject: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

Good afternoon,

I'm trying to build Open MPI 4.1.5 using GCC 11.3. However, I get an error that 
I'm not sure how to correct. The error is,

...
  CC   pscatter.lo
  CC   piscatter.lo
  CC   pscatterv.lo
  CC   piscatterv.lo
  CC   psend.lo
  CC   psend_init.lo
  CC   psendrecv.lo
  CC   psendrecv_replace.lo
  CC   pssend_init.lo
  CC   pssend.lo
  CC   pstart.lo
  CC   pstartall.lo
  CC   pstatus_c2f.lo
  CC   pstatus_f2c.lo
  CC   pstatus_set_cancelled.lo
  CC   pstatus_set_elements.lo
  CC   pstatus_set_elements_x.lo
  CC   ptestall.lo
  CC   ptestany.lo
  CC   ptest.lo
  CC   ptest_cancelled.lo
  CC   ptestsome.lo
  CC   ptopo_test.lo
  CC   ptype_c2f.lo
  CC   ptype_commit.lo
  CC   ptype_contiguous.lo
  CC   ptype_create_darray.lo
make[3]: *** No rule to make target 
'/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h', needed by 
'ptype_create_f90_complex.lo'.  Stop.
make[3]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c/profile'
make[2]: *** [Makefile:2559: all-recursive] Error 1
make[2]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c'
make[1]: *** [Makefile:3566: all-recursive] Error 1
make[1]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi'
make: *** [Makefile:1912: all-recursive] Error 1



Here is the configuration output from configure:

Open MPI configuration:
---
Version: 4.1.5
Build MPI C bindings: yes
Build M

Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-18 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
I don't know how openmpi does it, but I've definitely seen packages where "make 
clean" wipes the ".o" files but not the results of the configure process.  
Sometimes there's a "make distclean" which tries to get back closer to 
as-untarred state.

Noam

On Jul 18, 2023, at 12:51 PM, Jeffrey Layton via users 
mailto:users@lists.open-mpi.org>> wrote:

As soon as you pointed out /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h  
that made me think of the previous build.

I did "make clean" a _bunch_ of times before running configure and it didn't 
cure it. Strange.

But, nuking the source tree from orbit, just to be sure, and then 
configure/rebuild worked just create!

Thanks!

Jeff


On Tue, Jul 18, 2023 at 12:29 PM Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
There were probably quite a few differences from the output of "configure" 
between GCC 9.4 and GCC 11.3.

For example, your original post cited 
"/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h", which, I assume, does not 
exist on your new GCC 11.3-based system.

Meaning: if you had run make clean and then re-ran configure, it probably would 
have built ok.  But deleting the whole source tree and re-configuring + 
re-building also works.  🙂

From: Jeffrey Layton mailto:layto...@gmail.com>>
Sent: Tuesday, July 18, 2023 11:38 AM
To: Jeff Squyres (jsquyres) mailto:jsquy...@cisco.com>>
Cc: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

Jeff,

Thanks for the tip - it started me thinking a bit.

I was using a directory in my /home account with 4.1.5 that I had previously 
built using GCC 9.4 (Ubuntu 20.04). I rebuilt the system with Ubuntu-22.04 but 
I did a backup of /home. Then I copied the 4.1.5 directory to /home again.

I checked and I did a "make clean" before attempting to build 4.1.5 but with 
GCC 11.3 that came with Ubuntu 22.04. In fact, I did it several times before I 
ran configure.

Even after running "make clean" I got the error I mentioned in my initial post. 
This happened several times.

This morning, I blew away my 4.1.5 directory and downloaded a fresh 4.1.5. 
Configure went fine as did compiling it.

My theory is that some cruft from building 4.1.5 with GCC 9.4 compilers hung 
around, even after "make clean". Using a "fresh" download of 4.1.5 did not 
include this "cruft" so configure and make all proceeds just fine.

I don't know if this is correct and I can't point to any smoking gun though.

Thanks!

Jeff


On Mon, Jul 17, 2023 at 2:53 PM Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
That's a little odd.  Usually, the specific .h files that are listed as 
dependencies came from somewhere -- usually either part of the GNU Autotools 
dependency analysis.

I'm guessing that /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h doesn't 
actually exist on your system -- but then how did it get into Open MPI's 
makefiles?

Did you run configure on one machine and make on a different machine, perchance?

From: users 
mailto:users-boun...@lists.open-mpi.org>> on 
behalf of Jeffrey Layton via users 
mailto:users@lists.open-mpi.org>>
Sent: Monday, July 17, 2023 2:05 PM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Jeffrey Layton mailto:layto...@gmail.com>>
Subject: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

Good afternoon,

I'm trying to build Open MPI 4.1.5 using GCC 11.3. However, I get an error that 
I'm not sure how to correct. The error is,

...
  CC   pscatter.lo
  CC   piscatter.lo
  CC   pscatterv.lo
  CC   piscatterv.lo
  CC   psend.lo
  CC   psend_init.lo
  CC   psendrecv.lo
  CC   psendrecv_replace.lo
  CC   pssend_init.lo
  CC   pssend.lo
  CC   pstart.lo
  CC   pstartall.lo
  CC   pstatus_c2f.lo
  CC   pstatus_f2c.lo
  CC   pstatus_set_cancelled.lo
  CC   pstatus_set_elements.lo
  CC   pstatus_set_elements_x.lo
  CC   ptestall.lo
  CC   ptestany.lo
  CC   ptest.lo
  CC   ptest_cancelled.lo
  CC   ptestsome.lo
  CC   ptopo_test.lo
  CC   ptype_c2f.lo
  CC   ptype_commit.lo
  CC   ptype_contiguous.lo
  CC   ptype_create_darray.lo
make[3]: *** No rule to make target 
'/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h', needed by 
'ptype_create_f90_complex.lo'.  Stop.
make[3]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c/profile'
make[2]: *** [Makefile:2559: all-recursive] Error 1
make[2]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c'
make[1]: *** [Makefile:3566: all-recursive] Error 1
make[1]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi'
make: *** [Makefile:1912: all-recursive] Error 1



Here is the configuration output from configure:

Open MPI configuration:
---
Version: 4.1.5
Build MPI C bindings: yes
Build MPI C++ bindings (deprecated): no
Build MPI Fortran bindings: mpif.h, use 

Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-18 Thread Tom Kacvinsky via users
On Jul 18, 2023, at 16:05, Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users  wrote:




I don't know how openmpi does it, but I've definitely seen packages where "make clean" wipes the ".o" files but not the results of the configure process.  Sometimes there's a "make distclean" which tries to get back closer to as-untarred state.
With autotools based projects I've taken to doing out of source tree builds.  Then I can just blow away the directory where I did the configure/build.

Noam


On Jul 18, 2023, at 12:51 PM, Jeffrey Layton via users  wrote:



As soon as you pointed out 
/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h  that made me think of the previous build.


I did "make clean" a _bunch_ of times before running configure and it didn't cure it. Strange.


But, nuking the source tree from orbit, just to be sure, and then configure/rebuild worked just create!


Thanks!


Jeff





On Tue, Jul 18, 2023 at 12:29 PM Jeff Squyres (jsquyres)  wrote:





There were probably quite a few differences from the output of "configure" between GCC 9.4 and GCC 11.3.




For example, your original post cited "/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h", which, I assume, does not exist on your new GCC 11.3-based
 system.




Meaning: if you had run make clean and then re-ran configure, it probably would have built ok.  But deleting the whole source tree and re-configuring + re-building also works. 
🙂 


From: Jeffrey Layton 
Sent: Tuesday, July 18, 2023 11:38 AM
To: Jeff Squyres (jsquyres) 
Cc: Open MPI Users 
Subject: Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3
 




Jeff,


Thanks for the tip - it started me thinking a bit.


I was using a directory in my /home account with 4.1.5 that I had previously built using GCC 9.4 (Ubuntu 20.04). I rebuilt the system with Ubuntu-22.04 but I did a backup of /home. Then I copied the 4.1.5 directory to /home
 again.


I checked and I did a "make clean" before attempting to build 4.1.5 but with GCC 11.3 that came with Ubuntu 22.04. In fact, I did it several times before I ran configure.


Even after running "make clean" I got the error I mentioned in my initial post. This happened several times.


This morning, I blew away my 4.1.5 directory and downloaded a fresh 4.1.5. Configure went fine as did compiling it.




My theory is that some cruft from building 4.1.5 with GCC 9.4 compilers hung around, even after "make clean". Using a "fresh" download of 4.1.5 did not include this "cruft" so configure and make all proceeds just fine.


I don't know if this is correct and I can't point to any smoking gun though.


Thanks!


Jeff





On Mon, Jul 17, 2023 at 2:53 PM Jeff Squyres (jsquyres)  wrote:





That's a little odd.  Usually, the specific .h files that are listed as dependencies came from
somewhere -- usually either part of the GNU Autotools dependency analysis.




I'm guessing that /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h doesn't actually exist on your system -- but then how did it get into Open MPI's
 makefiles?




Did you run configure on one machine and make on a different machine, perchance?



From: users  on behalf of Jeffrey Layton via users 
Sent: Monday, July 17, 2023 2:05 PM
To: Open MPI Users 
Cc: Jeffrey Layton 
Subject: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3
 



Good afternoon,


I'm trying to build Open MPI 4.1.5 using GCC 11.3. However, I get an error that I'm not sure how to correct. The error is,


...
  CC       pscatter.lo
  CC       piscatter.lo
  CC       pscatterv.lo
  CC       piscatterv.lo
  CC       psend.lo
  CC       psend_init.lo
  CC       psendrecv.lo
  CC       psendrecv_replace.lo
  CC       pssend_init.lo
  CC       pssend.lo
  CC       pstart.lo
  CC       pstartall.lo
  CC       pstatus_c2f.lo
  CC       pstatus_f2c.lo
  CC       pstatus_set_cancelled.lo
  CC       pstatus_set_elements.lo
  CC       pstatus_set_elements_x.lo
  CC       ptestall.lo
  CC       ptestany.lo
  CC       ptest.lo
  CC       ptest_cancelled.lo
  CC       ptestsome.lo
  CC       ptopo_test.lo
  CC       ptype_c2f.lo
  CC       ptype_commit.lo
  CC       ptype_contiguous.lo
  CC       ptype_create_darray.lo
make[3]: *** No rule to make target '/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h', needed by 'ptype_create_f90_complex.lo'.  Stop.
make[3]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c/profile'
make[2]: *** [Makefile:2559: all-recursive] Error 1
make[2]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi/mpi/c'
make[1]: *** [Makefile:3566: all-recursive] Error 1
make[1]: Leaving directory '/home/laytonjb/src/openmpi-4.1.5/ompi'
make: *** [Makefile:1912: all-recursive] Error 1





Here is the configuration output from configure:


Open MPI co

Re: [OMPI users] [EXT] Re: Error handling

2023-07-18 Thread Alexander Stadik via users
Hey George,

I said random only because I do not see the method behind it, but exactly like 
this when I do allreduce by MIN and return a negative number I get either 248, 
253, 11 or 6 usually. Meaning that's purely a number from MPI side.

The Problem with MPI_Abort is it shows the correct number in its output in 
Logfile, but it does not communicate its value to other processes, or forward 
its value to exit. So one also always sees these "random" values.

When using positive numbers in range it seems to work, so my question was on 
how it works, and how one can do it? Is there a way to let MPI_Abort 
communicate  the value as exit code?
Why do negative numbers not work, or does one simply have to always use 
positive numbers? Why I would prefer Abort is because it seems safer.

BR Alex



Von: George Bosilca 
Gesendet: Dienstag, 18. Juli 2023 18:47
An: Open MPI Users 
Cc: Alexander Stadik 
Betreff: [EXT] Re: [OMPI users] Error handling

External: Check sender address and use caution opening links or attachments

Alex,

How are your values "random" if you provide correct values ? Even for negative 
values you could use MIN to pick one value and return it. What is the problem 
with `MPI_Abort` ? it does seem to do what you want.

  George.


On Tue, Jul 18, 2023 at 4:38 AM Alexander Stadik via users 
mailto:users@lists.open-mpi.org>> wrote:
Hey everyone,

I am working for longer time now with cuda-aware OpenMPI, and developed longer 
time back a small exceptions handling framework including MPI and CUDA 
exceptions.
Currently I am using MPI_Abort with costum error numbers, to terminate 
everything elegantly, which works well, by just reading the logfile in case of 
a crash.

Now I was wondering how one can handle return / exit codes properly between 
processes, since we would like to filter non-zero exits by return code.

One way is a simple Allreduce (in my case) + exit instead of Abort. But the 
problem seems to be the values are always "random" (since I was using negative 
codes), only by using MPI error codes it seems to work correctly.
But usage of that is limited.

Any suggestions on how to do this / how it can work properly?

BR Alex



[https://www.essteyr.com/wp-content/uploads/2020/02/pic-1_1568d80e-78e3-426f-85e8-4bf0051208351.png]

[https://www.essteyr.com/wp-content/uploads/2021/01/ESSSignatur3.png]

[https://www.essteyr.com/wp-content/uploads/2020/02/linkedin_38a91193-02cf-4df9-8e91-230f7459e9c3.png]
 
[https://www.essteyr.com/wp-content/uploads/2020/02/twitter_5fc7318f-c0e4-495c-b96c-ebd9cf186067.png]
   
[https://www.essteyr.com/wp-content/uploads/2020/02/facebook_ee01289e-1a90-48d0-8e82-049bb3c3a46b.png]
   
[https://www.essteyr.com/wp-content/uploads/2020/09/SocialLink_Instagram_32x32_ea55186d-8d0b-4f5e-a023-02e04995f5bf.png]
 

[cid:18969e488545e11509c1]

DI Alexander Stadik

Head of Large Scale Solutions
Research & Development | Large Scale Solutions

[cid:18969e488543cc8cba72] Book a 
Meeting

Phone:  +4372522044622
Company: +43725220446

Mail: alexander.sta...@essteyr.com


Register of Firms No.: FN 427703 a
Commercial Court: District Court Steyr
UID: ATU69213102

[https://www.essteyr.com/wp-content/uploads/2018/09/pic-2_f96fc865-57a5-4ef1-a924-add9b85d55cc1.png]

ESS Engineering Software Steyr GmbH • Berggasse 35 • 4400 • Steyr • Austria

[https://www.essteyr.com/wp-content/uploads/2018/09/pic-2_1df6b77f-61f1-40d3-a337-0145e62afb3e1.png]

This message is confidential. It may also be privileged or otherwise protected 
by work product immunity or other legal rules. If you have received it by 
mistake, please let us know by e-mail reply and delete it from your system; you 
may not copy this message or disclose its contents to anyone. Please send us by 
fax any message containing deadlines as incoming e-mails are not screened for 
response deadlines. The integrity and security of this message cannot be 
guaranteed on the Internet.