Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-26 Thread Mark Dixon via users

On Wed, 25 Nov 2020, Dave Love via users wrote:


The perf test says romio performs a bit better.  Also -- from overall
time -- it's faster on IMB-IO (which I haven't looked at in detail, and
ran with suboptimal striping).


I take that back.  I can't reproduce a significant difference for total
IMB-IO runtime, with both run in parallel on 16 ranks, using either the
system default of a single 1MB stripe or using eight stripes.  I haven't
teased out figures for different operations yet.  That must have been
done elsewhere, but I've never seen figures.


But remember that IMB-IO doesn't cover everything. For example, hdf5's 
t_bigio parallel test appears to be a pathological case and OMPIO is 2 
orders of magnitude slower on a Lustre filesystem:


- OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds
- OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds

End users seem to have the choice of:

- use openmpi 4.x and have some things broken (romio)
- use openmpi 4.x and have some things slow (ompio)
- use openmpi 3.x and everything works

My concern is that openmpi 3.x is near, or at, end of life.

Mark


t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, 
Lustre 2.12.5:

[login testpar]$ time mpirun -np 6 ./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection
***Express test mode on.  Several tests are skipped

real0m21.141s
user2m0.318s
sys 0m3.289s


[login testpar]$ export OMPI_MCA_io=ompio
[login testpar]$ time mpirun -np 6 ./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection
***Express test mode on.  Several tests are skipped

real42m34.103s
user213m22.925s
sys 8m6.742s



Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-26 Thread Gabriel, Edgar via users
I will have a look at the t_bigio tests on Lustre with ompio.  We had from 
collaborators some reports about the performance problems similar to the one 
that you mentioned here (which was the reason we were hesitant to make ompio 
the default on Lustre), but part of the problem is that we were not able to 
reproduce it reliably on the systems that we had access to, which we makes 
debugging and fixing the issue very difficult. Lustre is a very unforgiving 
file system, if you get something wrong with the settings, the performance is 
not just a bit off,  but often orders of magnitude (as in your measurements).

Thanks!
Edgar

-Original Message-
From: users  On Behalf Of Mark Dixon via users
Sent: Thursday, November 26, 2020 9:38 AM
To: Dave Love via users 
Cc: Mark Dixon ; Dave Love 

Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

On Wed, 25 Nov 2020, Dave Love via users wrote:

>> The perf test says romio performs a bit better.  Also -- from overall 
>> time -- it's faster on IMB-IO (which I haven't looked at in detail, 
>> and ran with suboptimal striping).
>
> I take that back.  I can't reproduce a significant difference for 
> total IMB-IO runtime, with both run in parallel on 16 ranks, using 
> either the system default of a single 1MB stripe or using eight 
> stripes.  I haven't teased out figures for different operations yet.  
> That must have been done elsewhere, but I've never seen figures.

But remember that IMB-IO doesn't cover everything. For example, hdf5's t_bigio 
parallel test appears to be a pathological case and OMPIO is 2 orders of 
magnitude slower on a Lustre filesystem:

- OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds
- OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds

End users seem to have the choice of:

- use openmpi 4.x and have some things broken (romio)
- use openmpi 4.x and have some things slow (ompio)
- use openmpi 3.x and everything works

My concern is that openmpi 3.x is near, or at, end of life.

Mark


t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, 
Lustre 2.12.5:

[login testpar]$ time mpirun -np 6 ./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection ***Express test mode on.  Several 
tests are skipped

real0m21.141s
user2m0.318s
sys 0m3.289s


[login testpar]$ export OMPI_MCA_io=ompio [login testpar]$ time mpirun -np 6 
./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection ***Express test mode on.  Several 
tests are skipped

real42m34.103s
user213m22.925s
sys 8m6.742s



Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-26 Thread Mark Dixon via users

Hi Edgar,

Thank you so much for your reply. Having run a number of Lustre systems 
over the years, I fully sympathise with your characterisation of Lustre as 
being very unforgiving!


Best wishes,

Mark

On Thu, 26 Nov 2020, Gabriel, Edgar wrote:

I will have a look at the t_bigio tests on Lustre with ompio.  We had 
from collaborators some reports about the performance problems similar 
to the one that you mentioned here (which was the reason we were 
hesitant to make ompio the default on Lustre), but part of the problem 
is that we were not able to reproduce it reliably on the systems that we 
had access to, which we makes debugging and fixing the issue very 
difficult. Lustre is a very unforgiving file system, if you get 
something wrong with the settings, the performance is not just a bit 
off, but often orders of magnitude (as in your measurements).


Thanks!
Edgar

-Original Message-
From: users  On Behalf Of Mark Dixon via users
Sent: Thursday, November 26, 2020 9:38 AM
To: Dave Love via users 
Cc: Mark Dixon ; Dave Love 

Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

On Wed, 25 Nov 2020, Dave Love via users wrote:


The perf test says romio performs a bit better.  Also -- from overall
time -- it's faster on IMB-IO (which I haven't looked at in detail,
and ran with suboptimal striping).


I take that back.  I can't reproduce a significant difference for
total IMB-IO runtime, with both run in parallel on 16 ranks, using
either the system default of a single 1MB stripe or using eight
stripes.  I haven't teased out figures for different operations yet.
That must have been done elsewhere, but I've never seen figures.


But remember that IMB-IO doesn't cover everything. For example, hdf5's t_bigio 
parallel test appears to be a pathological case and OMPIO is 2 orders of 
magnitude slower on a Lustre filesystem:

- OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds
- OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds

End users seem to have the choice of:

- use openmpi 4.x and have some things broken (romio)
- use openmpi 4.x and have some things slow (ompio)
- use openmpi 3.x and everything works

My concern is that openmpi 3.x is near, or at, end of life.

Mark


t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, 
Lustre 2.12.5:

[login testpar]$ time mpirun -np 6 ./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection ***Express test mode on.  Several 
tests are skipped

real0m21.141s
user2m0.318s
sys 0m3.289s


[login testpar]$ export OMPI_MCA_io=ompio [login testpar]$ time mpirun -np 6 
./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection ***Express test mode on.  Several 
tests are skipped

real42m34.103s
user213m22.925s
sys 8m6.742s