[OMPI users] Segmentation fault

2023-08-09 Thread Aziz Ogutlu via users

Hi there all,

We're using SU2 with OpenMPI 4.0.3, gcc 8.5.0 on Redhat 7.9. We compiled 
all component for using on HPC system.


When I use SU2 with QuickStart config file with OpenMPI, it gives error 
like in attached file.

Command is:
|mpirun -np 8 --allow-run-as-root SU2_CFD inv_NACA0012.cfg|

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72
[headnode:3423 :0:3423] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x30)
[headnode:3426 :0:3426] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x30)
[headnode:3424 :0:3424] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x30)
[headnode:3427 :0:3427] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x30)
[headnode:3420 :0:3420] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x30)
[headnode:3425 :0:3425] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x30)
[headnode:3422 :0:3422] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x30)
[headnode:3421 :0:3421] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x30)
 backtrace (tid:   3421) 
 0 0x0004d455 ucs_debug_print_backtrace()  ???:0
 1 0x00043b08 ompi_file_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/file/file.c:153
 2 0x00043b08 ompi_file_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/file/file.c:153
 3 0x00064336 PMPI_File_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/mpi/c/profile/pfile_close.c:60
 4 0x005ff508 CFileWriter::OpenMPIFile()  ???:0
 5 0x0060f4ce CSU2BinaryFileWriter::Write_Data()  ???:0
 6 0x005aa7a0 COutput::WriteToFile()  ???:0
 7 0x005ac166 COutput::SetResult_Files()  ???:0
 8 0x004fbf02 CSinglezoneDriver::Output()  ???:0
 9 0x004fc86a CSinglezoneDriver::StartSolver()  ???:0
10 0x00472c29 main()  ???:0
11 0x00022555 __libc_start_main()  ???:0
12 0x00494f87 _start()  ???:0
 backtrace (tid:   3427) 
 0 0x0004d455 ucs_debug_print_backtrace()  ???:0
 1 0x00043b08 ompi_file_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/file/file.c:153
 2 0x00043b08 ompi_file_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/file/file.c:153
 3 0x00064336 PMPI_File_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/mpi/c/profile/pfile_close.c:60
 4 0x005ff508 CFileWriter::OpenMPIFile()  ???:0
 5 0x0060f4ce CSU2BinaryFileWriter::Write_Data()  ???:0
 6 0x005aa7a0 COutput::WriteToFile()  ???:0
 7 0x005ac166 COutput::SetResult_Files()  ???:0
 8 0x004fbf02 CSinglezoneDriver::Output()  ???:0
 9 0x004fc86a CSinglezoneDriver::StartSolver()  ???:0
10 0x00472c29 main()  ???:0
11 0x00022555 __libc_start_main()  ???:0
12 0x00494f87 _start()  ???:0
=
 backtrace (tid:   3425) 
 0 0x0004d455 ucs_debug_print_backtrace()  ???:0
 1 0x00043b08 ompi_file_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/file/file.c:153
 2 0x00043b08 ompi_file_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/file/file.c:153
 3 0x00064336 PMPI_File_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/mpi/c/profile/pfile_close.c:60
 4 0x005ff508 CFileWriter::OpenMPIFile()  ???:0
 5 0x0060f4ce CSU2BinaryFileWriter::Write_Data()  ???:0
 6 0x005aa7a0 COutput::WriteToFile()  ???:0
 7 0x005ac166 COutput::SetResult_Files()  ???:0
 8 0x004fbf02 CSinglezoneDriver::Output()  ???:0
 9 0x004fc86a CSinglezoneDriver::StartSolver()  ???:0
10 0x00472c29 main()  ???:0
11 0x00022555 __libc_start_main()  ???:0
12 0x00494f87 _start()  ???:0
=
 backtrace (tid:   3423) 
 0 0x0004d455 ucs_debug_print_backtrace()  ???:0
 1 0x00043b08 ompi_file_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/file/file.c:153
 2 0x00043b08 ompi_file_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/file/file.c:153
 3 0x00064336 PMPI_File_close()  
/var/tmp/OFED_topdir/BUILD/openmpi-4.0.3rc4/ompi/mpi/c/profile/pfile_close.c:60
 4 0x005ff508 CFileWriter::OpenMPIFile()  ???:0
 5 0x0060f4ce CSU2BinaryFileWriter::Write_Data()  ???:0
 6 0x005aa7a0 COutput::WriteToFile()  ???:0
 7 0x005ac166 COutput::SetResult_Files()  ???:0
 8 0x004fbf02 CSinglezoneDriver::Output()  ???:0
 9 0x004fc86a CSinglezoneDriver::StartSolver()  ???:0
10 0x00472c29 main()  ???:0
11 0x000

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
I'm afraid I don't know anything about the SU2 application.

You are using Open MPI v4.0.3, which is fairly old.  Many bug fixes have been 
released since that version.  Can you upgrade to the latest version of Open MPI 
(v4.1.5)?

From: users  on behalf of Aziz Ogutlu via 
users 
Sent: Wednesday, August 9, 2023 3:26 AM
To: Open MPI Users 
Cc: Aziz Ogutlu 
Subject: [OMPI users] Segmentation fault


Hi there all,

We're using SU2 with OpenMPI 4.0.3, gcc 8.5.0 on Redhat 7.9. We compiled all 
component for using on HPC system.

When I use SU2 with QuickStart config file with OpenMPI, it gives error like in 
attached file.
Command is:
mpirun -np 8 --allow-run-as-root SU2_CFD inv_NACA0012.cfg

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.  
www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72


Re: [OMPI users] Segmentation fault

2023-08-09 Thread Aziz Ogutlu via users

Hi Jeff,

I also tried with OpenMPI 4.1.5, I got same error.


On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote:

I'm afraid I don't know anything about the SU2 application.

You are using Open MPI v4.0.3, which is fairly old.  Many bug fixes 
have been released since that version.  Can you upgrade to the latest 
version of Open MPI (v4.1.5)?


*From:* users  on behalf of Aziz 
Ogutlu via users 

*Sent:* Wednesday, August 9, 2023 3:26 AM
*To:* Open MPI Users 
*Cc:* Aziz Ogutlu 
*Subject:* [OMPI users] Segmentation fault

Hi there all,

We're using SU2 with OpenMPI 4.0.3, gcc 8.5.0 on Redhat 7.9. We 
compiled all component for using on HPC system.


When I use SU2 with QuickStart config file with OpenMPI, it gives 
error like in attached file.

Command is:
|mpirun -np 8 --allow-run-as-root SU2_CFD inv_NACA0012.cfg|

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.www.eduline.com.tr  

Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72


--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72


Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
Ok, thanks for upgrading.  Are you also using the latest version of SU2?

Without knowing what that application is doing, it's a little hard to debug the 
issue from our side.  At first glance, it looks like it is crashing when it has 
completed writing a file and is attempting to close it.  But the pointer that 
Open MPI got to close the file looks like it is bogus (i.e., 0x30 instead of a 
real pointer value).

You might need to raise the issue with the SU2 community and ask if there are 
any known issues with the application, or your particular use case of that 
application.

From: Aziz Ogutlu 
Sent: Wednesday, August 9, 2023 10:08 AM
To: Jeff Squyres (jsquyres) ; Open MPI Users 

Subject: Re: [OMPI users] Segmentation fault


Hi Jeff,

I also tried with OpenMPI 4.1.5, I got same error.


On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote:
I'm afraid I don't know anything about the SU2 application.

You are using Open MPI v4.0.3, which is fairly old.  Many bug fixes have been 
released since that version.  Can you upgrade to the latest version of Open MPI 
(v4.1.5)?

From: users 
 on 
behalf of Aziz Ogutlu via users 

Sent: Wednesday, August 9, 2023 3:26 AM
To: Open MPI Users 
Cc: Aziz Ogutlu 
Subject: [OMPI users] Segmentation fault


Hi there all,

We're using SU2 with OpenMPI 4.0.3, gcc 8.5.0 on Redhat 7.9. We compiled all 
component for using on HPC system.

When I use SU2 with QuickStart config file with OpenMPI, it gives error like in 
attached file.
Command is:
mpirun -np 8 --allow-run-as-root SU2_CFD inv_NACA0012.cfg

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.  
www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.  
www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72


Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
Without knowing anything about SU2, we can't really help debug the issue.  The 
seg fault stack trace that you provided was quite deep; we don't really have 
the resources to go learn about how a complex application like SU2 is 
implemented -- sorry!

Can you or they provide a small, simple MPI application that replicates the 
issue?  That would be something we could dig into and investigate.

From: Aziz Ogutlu 
Sent: Wednesday, August 9, 2023 10:31 AM
To: Jeff Squyres (jsquyres) ; Open MPI Users 

Subject: Re: [OMPI users] Segmentation fault


Hi Jeff,


I'm using also SU2 lastest version, also open issue on github page. They says 
it could be about OpenMPI :)


On 8/9/23 17:28, Jeff Squyres (jsquyres) wrote:
Ok, thanks for upgrading.  Are you also using the latest version of SU2?

Without knowing what that application is doing, it's a little hard to debug the 
issue from our side.  At first glance, it looks like it is crashing when it has 
completed writing a file and is attempting to close it.  But the pointer that 
Open MPI got to close the file looks like it is bogus (i.e., 0x30 instead of a 
real pointer value).

You might need to raise the issue with the SU2 community and ask if there are 
any known issues with the application, or your particular use case of that 
application.

From: Aziz Ogutlu 

Sent: Wednesday, August 9, 2023 10:08 AM
To: Jeff Squyres (jsquyres) ; 
Open MPI Users 
Subject: Re: [OMPI users] Segmentation fault


Hi Jeff,

I also tried with OpenMPI 4.1.5, I got same error.


On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote:
I'm afraid I don't know anything about the SU2 application.

You are using Open MPI v4.0.3, which is fairly old.  Many bug fixes have been 
released since that version.  Can you upgrade to the latest version of Open MPI 
(v4.1.5)?

From: users 
 on 
behalf of Aziz Ogutlu via users 

Sent: Wednesday, August 9, 2023 3:26 AM
To: Open MPI Users 
Cc: Aziz Ogutlu 
Subject: [OMPI users] Segmentation fault


Hi there all,

We're using SU2 with OpenMPI 4.0.3, gcc 8.5.0 on Redhat 7.9. We compiled all 
component for using on HPC system.

When I use SU2 with QuickStart config file with OpenMPI, it gives error like in 
attached file.
Command is:
mpirun -np 8 --allow-run-as-root SU2_CFD inv_NACA0012.cfg

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.  
www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.  
www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.  
www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72


Re: [OMPI users] Segmentation fault

2023-08-09 Thread Aziz Ogutlu via users

Hi Jeff,


I'm using also SU2 lastest version, also open issue on github page. They 
says it could be about OpenMPI :)



On 8/9/23 17:28, Jeff Squyres (jsquyres) wrote:

Ok, thanks for upgrading.  Are you also using the latest version of SU2?

Without knowing what that application is doing, it's a little hard to 
debug the issue from our side.  At first glance, it looks like it is 
crashing when it has completed writing a file and is attempting to 
close it.  But the pointer that Open MPI got to close the file looks 
like it is bogus (i.e., 0x30 instead of a real pointer value).


You might need to raise the issue with the SU2 community and ask if 
there are any known issues with the application, or your particular 
use case of that application.


*From:* Aziz Ogutlu 
*Sent:* Wednesday, August 9, 2023 10:08 AM
*To:* Jeff Squyres (jsquyres) ; Open MPI Users 


*Subject:* Re: [OMPI users] Segmentation fault

Hi Jeff,

I also tried with OpenMPI 4.1.5, I got same error.


On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote:

I'm afraid I don't know anything about the SU2 application.

You are using Open MPI v4.0.3, which is fairly old.  Many bug fixes 
have been released since that version.  Can you upgrade to the latest 
version of Open MPI (v4.1.5)?


*From:* users  
 on behalf of Aziz Ogutlu 
via users  

*Sent:* Wednesday, August 9, 2023 3:26 AM
*To:* Open MPI Users  

*Cc:* Aziz Ogutlu  


*Subject:* [OMPI users] Segmentation fault

Hi there all,

We're using SU2 with OpenMPI 4.0.3, gcc 8.5.0 on Redhat 7.9. We 
compiled all component for using on HPC system.


When I use SU2 with QuickStart config file with OpenMPI, it gives 
error like in attached file.

Command is:
|mpirun -np 8 --allow-run-as-root SU2_CFD inv_NACA0012.cfg|

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.www.eduline.com.tr  

Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72

--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.www.eduline.com.tr  

Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72


--
İyi çalışmalar,
Aziz Öğütlü

Eduline Bilişim Sanayi ve Ticaret Ltd. Şti.www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane -  İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72


Re: [OMPI users] MPI I/O, Romio vs Ompio on GPFS

2023-08-09 Thread Latham, Robert J. via users
Hah! look at me ressurecting this old thread...I should check in on my OpenMPI 
folder more often than every 18 months.

Historically, GPFS performs best with block-aligned I/O.

Today, the performance difference between unaligned and aligned is not as 
dramatic as it used to be (at least on ORNL's Summit file system)

I don't know the OMPIO tuning space but in the ROMIO side of things, setting 
the  "striping_unit" to anything will also make ROMIO align "file domains" to 
that value.

So for GPFS if I know the block size of the file system is 4 MiB, I'll set 
"striping_unit" to 4194304

(A ROMIO "file domain" is the region of the file that the aggregator is 
responsible for)

This writeup and paper are pretty old at this point, but the lessons still hold:

https://wordpress.cels.anl.gov/romio/2008/11/20/tuning-collective-io-strategies-for-gpfs-and-lustre/

I agree with Edgar Gabriel: the GPFS low-level hints sure sound promising but 
in practice I have not seen drastic (or any) performance difference using them.

==rob

On Tue, 2022-06-14 at 16:35 +, Edgar Gabriel via users wrote:
> 
> 
> Hi,
>  
> There are a few things that you could test to see whether they make
> difference.
>  
>    1. Try to modify the number of aggregators used in collective I/O
> (assuming that the code uses collective I/O). You could try e.g. to
> set it to the number of nodes used (the algorithm determining the
> number of aggregators automatically is sometimes overly aggressive).
> E.g.
>  
> mpirun –mca io_ompio_num_aggregators 16 -np 256 ./executable name
>  
> (assuming here that you run 256 processes distributed on 16 nodes).
> Based on our tests from a while back gpfs was not super sensitive to
> this, but you never know, its worth a try.
>  
>    1. If your data is large and mostly contiguous,  you could try to
> disable data sieving for write operations, e.g.
>  
> mpirun --mca fbtl_posix_write_datasieving 0 -np 256 ./…
>  
> Let me know if these make a difference. There are quite a couple of
> info objects that the gpfs fs component understands and that
> potentially could be used to tune the performance, but I do not have
> experience with them, they are based on code contributed by the HLRS
> a couple of years ago. You can still have a look at them and see
> whether some of them would make sense (source location:
> ompi/ompi/mca/fs/gpfs/fs_gpfs_file_set_info.c).
>  
> Thanks
> Edgar
>  
>  
> 
> 
> From: users  On Behalf OfEric
> Chamberland via users
> Sent: Saturday, June 11, 2022 9:28 PM
> To: Open MPI Users 
> Cc: Eric Chamberland ; Ramses van
> Zon ; Vivien Clauzon
> ; dave.mar...@giref.ulaval.ca; Thomas
> Briffard 
> Subject: Re: [OMPI users] MPI I/O, Romio vs Ompio on GPFS
>  
> Hi,
> just almost found what I wanted with "--mcaio_base_verbose 100"
> Now I am looking at performances for GPFS and I must say OpenMPI
> 4.1.2 performs very poorly when it comes the time to write.
> I am launching a 512 processes, read+compute (ghosts components of a
> mesh), and then later write a 79Gb file.
> Here are the timings (all in seconds):
> 
> IO module ;  reading+ghost computing ; writing
> ompio   ;   24.9   ; 2040+ (job got killed before completion)
> romio321 ;  20.8    ; 15.6
> 
> I have run many times the job with Ompio module (the default) and
> Romio and the timings are always similar to those given.
> I also activated maximum debug output with " --mca mca_base_verbose
> stdout,level:9  --mca mpi_show_mca_params all --mca io_base_verbose
> 100" and got a few lines but nothing relevant to debug:
> Sat Jun 11 20:08:28 2022:chrono::ecritMaillageMPI::debut
> VmSize: 6530408 VmRSS: 5599604 VmPeak: 7706396 VmData: 5734408 VmHWM:
> 5699324 
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:delete: deleting file: resultat01_-2.mail
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:delete: Checking all available modules
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:delete: component available: ompio, priority: 30
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:delete: component available: romio321, priority: 10
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:delete: Selected io component ompio
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:file_select: new file: resultat01_-2.mail
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:file_select: Checking all available modules
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:file_select: component available: ompio, priority: 30
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:file_select: component available: romio321, priority: 10
> Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683]
> io:base:file_select: Selected io module ompio
> 
> What else can I do to dig into this?
> Are there parameters ompio is aware of with GPFS?
> Thanks,
> Eric
> -- 
> Eric Chamberland,