Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?
On Wed, 25 Nov 2020, Dave Love via users wrote: The perf test says romio performs a bit better. Also -- from overall time -- it's faster on IMB-IO (which I haven't looked at in detail, and ran with suboptimal striping). I take that back. I can't reproduce a significant difference for total IMB-IO runtime, with both run in parallel on 16 ranks, using either the system default of a single 1MB stripe or using eight stripes. I haven't teased out figures for different operations yet. That must have been done elsewhere, but I've never seen figures. But remember that IMB-IO doesn't cover everything. For example, hdf5's t_bigio parallel test appears to be a pathological case and OMPIO is 2 orders of magnitude slower on a Lustre filesystem: - OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds - OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds End users seem to have the choice of: - use openmpi 4.x and have some things broken (romio) - use openmpi 4.x and have some things slow (ompio) - use openmpi 3.x and everything works My concern is that openmpi 3.x is near, or at, end of life. Mark t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, Lustre 2.12.5: [login testpar]$ time mpirun -np 6 ./t_bigio Testing Dataset1 write by ROW Testing Dataset2 write by COL Testing Dataset3 write select ALL proc 0, NONE others Testing Dataset4 write point selection Read Testing Dataset1 by COL Read Testing Dataset2 by ROW Read Testing Dataset3 read select ALL proc 0, NONE others Read Testing Dataset4 with Point selection ***Express test mode on. Several tests are skipped real0m21.141s user2m0.318s sys 0m3.289s [login testpar]$ export OMPI_MCA_io=ompio [login testpar]$ time mpirun -np 6 ./t_bigio Testing Dataset1 write by ROW Testing Dataset2 write by COL Testing Dataset3 write select ALL proc 0, NONE others Testing Dataset4 write point selection Read Testing Dataset1 by COL Read Testing Dataset2 by ROW Read Testing Dataset3 read select ALL proc 0, NONE others Read Testing Dataset4 with Point selection ***Express test mode on. Several tests are skipped real42m34.103s user213m22.925s sys 8m6.742s
Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?
I will have a look at the t_bigio tests on Lustre with ompio. We had from collaborators some reports about the performance problems similar to the one that you mentioned here (which was the reason we were hesitant to make ompio the default on Lustre), but part of the problem is that we were not able to reproduce it reliably on the systems that we had access to, which we makes debugging and fixing the issue very difficult. Lustre is a very unforgiving file system, if you get something wrong with the settings, the performance is not just a bit off, but often orders of magnitude (as in your measurements). Thanks! Edgar -Original Message- From: users On Behalf Of Mark Dixon via users Sent: Thursday, November 26, 2020 9:38 AM To: Dave Love via users Cc: Mark Dixon ; Dave Love Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO? On Wed, 25 Nov 2020, Dave Love via users wrote: >> The perf test says romio performs a bit better. Also -- from overall >> time -- it's faster on IMB-IO (which I haven't looked at in detail, >> and ran with suboptimal striping). > > I take that back. I can't reproduce a significant difference for > total IMB-IO runtime, with both run in parallel on 16 ranks, using > either the system default of a single 1MB stripe or using eight > stripes. I haven't teased out figures for different operations yet. > That must have been done elsewhere, but I've never seen figures. But remember that IMB-IO doesn't cover everything. For example, hdf5's t_bigio parallel test appears to be a pathological case and OMPIO is 2 orders of magnitude slower on a Lustre filesystem: - OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds - OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds End users seem to have the choice of: - use openmpi 4.x and have some things broken (romio) - use openmpi 4.x and have some things slow (ompio) - use openmpi 3.x and everything works My concern is that openmpi 3.x is near, or at, end of life. Mark t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, Lustre 2.12.5: [login testpar]$ time mpirun -np 6 ./t_bigio Testing Dataset1 write by ROW Testing Dataset2 write by COL Testing Dataset3 write select ALL proc 0, NONE others Testing Dataset4 write point selection Read Testing Dataset1 by COL Read Testing Dataset2 by ROW Read Testing Dataset3 read select ALL proc 0, NONE others Read Testing Dataset4 with Point selection ***Express test mode on. Several tests are skipped real0m21.141s user2m0.318s sys 0m3.289s [login testpar]$ export OMPI_MCA_io=ompio [login testpar]$ time mpirun -np 6 ./t_bigio Testing Dataset1 write by ROW Testing Dataset2 write by COL Testing Dataset3 write select ALL proc 0, NONE others Testing Dataset4 write point selection Read Testing Dataset1 by COL Read Testing Dataset2 by ROW Read Testing Dataset3 read select ALL proc 0, NONE others Read Testing Dataset4 with Point selection ***Express test mode on. Several tests are skipped real42m34.103s user213m22.925s sys 8m6.742s
Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?
Hi Edgar, Thank you so much for your reply. Having run a number of Lustre systems over the years, I fully sympathise with your characterisation of Lustre as being very unforgiving! Best wishes, Mark On Thu, 26 Nov 2020, Gabriel, Edgar wrote: I will have a look at the t_bigio tests on Lustre with ompio. We had from collaborators some reports about the performance problems similar to the one that you mentioned here (which was the reason we were hesitant to make ompio the default on Lustre), but part of the problem is that we were not able to reproduce it reliably on the systems that we had access to, which we makes debugging and fixing the issue very difficult. Lustre is a very unforgiving file system, if you get something wrong with the settings, the performance is not just a bit off, but often orders of magnitude (as in your measurements). Thanks! Edgar -Original Message- From: users On Behalf Of Mark Dixon via users Sent: Thursday, November 26, 2020 9:38 AM To: Dave Love via users Cc: Mark Dixon ; Dave Love Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO? On Wed, 25 Nov 2020, Dave Love via users wrote: The perf test says romio performs a bit better. Also -- from overall time -- it's faster on IMB-IO (which I haven't looked at in detail, and ran with suboptimal striping). I take that back. I can't reproduce a significant difference for total IMB-IO runtime, with both run in parallel on 16 ranks, using either the system default of a single 1MB stripe or using eight stripes. I haven't teased out figures for different operations yet. That must have been done elsewhere, but I've never seen figures. But remember that IMB-IO doesn't cover everything. For example, hdf5's t_bigio parallel test appears to be a pathological case and OMPIO is 2 orders of magnitude slower on a Lustre filesystem: - OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds - OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds End users seem to have the choice of: - use openmpi 4.x and have some things broken (romio) - use openmpi 4.x and have some things slow (ompio) - use openmpi 3.x and everything works My concern is that openmpi 3.x is near, or at, end of life. Mark t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, Lustre 2.12.5: [login testpar]$ time mpirun -np 6 ./t_bigio Testing Dataset1 write by ROW Testing Dataset2 write by COL Testing Dataset3 write select ALL proc 0, NONE others Testing Dataset4 write point selection Read Testing Dataset1 by COL Read Testing Dataset2 by ROW Read Testing Dataset3 read select ALL proc 0, NONE others Read Testing Dataset4 with Point selection ***Express test mode on. Several tests are skipped real0m21.141s user2m0.318s sys 0m3.289s [login testpar]$ export OMPI_MCA_io=ompio [login testpar]$ time mpirun -np 6 ./t_bigio Testing Dataset1 write by ROW Testing Dataset2 write by COL Testing Dataset3 write select ALL proc 0, NONE others Testing Dataset4 write point selection Read Testing Dataset1 by COL Read Testing Dataset2 by ROW Read Testing Dataset3 read select ALL proc 0, NONE others Read Testing Dataset4 with Point selection ***Express test mode on. Several tests are skipped real42m34.103s user213m22.925s sys 8m6.742s