Hah! look at me ressurecting this old thread...I should check in on my OpenMPI folder more often than every 18 months.
Historically, GPFS performs best with block-aligned I/O. Today, the performance difference between unaligned and aligned is not as dramatic as it used to be (at least on ORNL's Summit file system) I don't know the OMPIO tuning space but in the ROMIO side of things, setting the "striping_unit" to anything will also make ROMIO align "file domains" to that value. So for GPFS if I know the block size of the file system is 4 MiB, I'll set "striping_unit" to 4194304 (A ROMIO "file domain" is the region of the file that the aggregator is responsible for) This writeup and paper are pretty old at this point, but the lessons still hold: https://wordpress.cels.anl.gov/romio/2008/11/20/tuning-collective-io-strategies-for-gpfs-and-lustre/ I agree with Edgar Gabriel: the GPFS low-level hints sure sound promising but in practice I have not seen drastic (or any) performance difference using them. ==rob On Tue, 2022-06-14 at 16:35 +0000, Edgar Gabriel via users wrote: > > > Hi, > > There are a few things that you could test to see whether they make > difference. > > 1. Try to modify the number of aggregators used in collective I/O > (assuming that the code uses collective I/O). You could try e.g. to > set it to the number of nodes used (the algorithm determining the > number of aggregators automatically is sometimes overly aggressive). > E.g. > > mpirun –mca io_ompio_num_aggregators 16 -np 256 ./executable name > > (assuming here that you run 256 processes distributed on 16 nodes). > Based on our tests from a while back gpfs was not super sensitive to > this, but you never know, its worth a try. > > 1. If your data is large and mostly contiguous, you could try to > disable data sieving for write operations, e.g. > > mpirun --mca fbtl_posix_write_datasieving 0 -np 256 ./… > > Let me know if these make a difference. There are quite a couple of > info objects that the gpfs fs component understands and that > potentially could be used to tune the performance, but I do not have > experience with them, they are based on code contributed by the HLRS > a couple of years ago. You can still have a look at them and see > whether some of them would make sense (source location: > ompi/ompi/mca/fs/gpfs/fs_gpfs_file_set_info.c). > > Thanks > Edgar > > > > > From: users <users-boun...@lists.open-mpi.org> On Behalf OfEric > Chamberland via users > Sent: Saturday, June 11, 2022 9:28 PM > To: Open MPI Users <users@lists.open-mpi.org> > Cc: Eric Chamberland <eric.chamberl...@giref.ulaval.ca>; Ramses van > Zon <r...@scinet.utoronto.ca>; Vivien Clauzon > <vivien.clau...@michelin.com>; dave.mar...@giref.ulaval.ca; Thomas > Briffard <thomas.briff...@michelin.com> > Subject: Re: [OMPI users] MPI I/O, Romio vs Ompio on GPFS > > Hi, > just almost found what I wanted with "--mcaio_base_verbose 100" > Now I am looking at performances for GPFS and I must say OpenMPI > 4.1.2 performs very poorly when it comes the time to write. > I am launching a 512 processes, read+compute (ghosts components of a > mesh), and then later write a 79Gb file. > Here are the timings (all in seconds): > -------------------- > IO module ; reading+ghost computing ; writing > ompio ; 24.9 ; 2040+ (job got killed before completion) > romio321 ; 20.8 ; 15.6 > -------------------- > I have run many times the job with Ompio module (the default) and > Romio and the timings are always similar to those given. > I also activated maximum debug output with " --mca mca_base_verbose > stdout,level:9 --mca mpi_show_mca_params all --mca io_base_verbose > 100" and got a few lines but nothing relevant to debug: > Sat Jun 11 20:08:28 2022<stdout>:chrono::ecritMaillageMPI::debut > VmSize: 6530408 VmRSS: 5599604 VmPeak: 7706396 VmData: 5734408 VmHWM: > 5699324 <etiq_143> > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:delete: deleting file: resultat01_-2.mail > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:delete: Checking all available modules > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:delete: component available: ompio, priority: 30 > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:delete: component available: romio321, priority: 10 > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:delete: Selected io component ompio > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:file_select: new file: resultat01_-2.mail > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:file_select: Checking all available modules > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:file_select: component available: ompio, priority: 30 > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:file_select: component available: romio321, priority: 10 > Sat Jun 11 20:08:28 2022<stdout>:[nia0073.scinet.local:236683] > io:base:file_select: Selected io module ompio > > What else can I do to dig into this? > Are there parameters ompio is aware of with GPFS? > Thanks, > Eric > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Université Laval > (418) 656-2131 poste 41 22 42 > > On 2022-06-10 16:23, Eric Chamberland via users wrote: > > Hi, > > > > I want to try romio with OpenMPI 4.1.2 because I am observing a big > > performance difference with IntelMPI on GPFS. > > > > I want to see, at *runtime*, all parameters (default values, names) > > used by MPI (at least for the "io" framework). > > > > I would like to have all the same output as "ompi_info --all" gives > > me... > > > > I have tried this: > > > > mpiexec --mca io romio321 --mca mca_verbose 1 --mca > > mpi_show_mca_params 1 --mca io_base_verbose 1 ... > > > > But I cannot see anything about io coming out... > > > > With "ompi_info" I do... > > > > Is it possible? > > > > Thanks, > > > > Eric > > > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Université Laval > (418) 656-2131 poste 41 22 42