Re: [Hdf-forum] Very poor performance of pHDF5 when using single (shared) file

Babak Behzad Tue, 03 Sep 2013 16:35:53 -0700

Hi Daniel,

As Mohamad eluded to, we have developed a framework for auto-tuning HDF5applications which is going to be presented at this year'sSupercomputing conference:

http://sc13.supercomputing.org/schedule/event_detail.php?evid=pap511

And I have recently installed this framework on Bluewaters. In case youare interested in increasing the I/O performance of your applicationmore, I think I will be able to help you. You can contact me directly tofollow-up.


Thanks,
Babak

On 09/03/2013 11:14 AM, Mohamad Chaarawi wrote:

Hi Daniel,

I'm not sure what the issue with the forum email list is, but nobody seems to 
have this problem. Just make sure you are always sending your messages and 
replies to [email protected]; not another address.
I'll ask the sysadmins to look into this issue more.

Now to your results, the multiple file strategy is always (at least in most 
cases) going to be the fastest strategy. There are no locking contention, and 
not inter-process communication overhead.
The difference in performance with the single file strategy still seems a bit 
high in your case, but again I'm saying this with a total lack of knowledge on 
how your benchmark/application is accessing the file. I do not believe chunking 
will help here.

One thing worth trying is varying the number of MPI aggregators. What MPI 
library are you using? The MPI IO library is most probably ROMIO, so it should 
accepts info hints (not sure if the top level implementation might ignore those 
hints, but you can check anyway).
So use an MPI info object, that you pass in H5Pset_fapl_mpio(), to set the 
number of MPI aggregators (cb_nodes, and cb_buffer_size). A full list of hints 
to ROMIO can be found here:
http://www.mcs.anl.gov/research/projects/romio/doc/users-guide.pdf
I would set the cb_nodes to the stripe count; and try the cb_buffer_size as the 
stripe size. Those are not necessary the ideal options, but best to start there.

I know that all this tuning is a burden for an application user of HDF5, but 
that is what needs to be done today to get good performance. There have been 
some work done aimed at auto tuning all this parameter space using a separate 
tool, but the architecture is not user friendly yet for someone to simply grab, 
deploy and run.

Thanks,
Mohamad



-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Daniel Langr
Sent: Tuesday, September 03, 2013 10:38 AM
To: [email protected]
Subject: Re: [Hdf-forum] Very poor performance of pHDF5 when using single 
(shared) file

Mohamad,

I really do not understand how to reply to this forum :(. I tried to reply to 
your post, which I received via e-mail. In this e-mail, there was the following 
note:

"
If you reply to this email, your message will be added to the discussion
below:
http://hdf-forum.184993.n3.nabble.com/Very-poor-performance-of-pHDF5-when-using-single-shared-file-tp4026443p4026449.html
"

So, I replied to this e-mail, and received another one:

Subject: Delivery Status Notification (Failure)

"
Delivery to the following recipient failed permanently:
[email protected]

Your email to [email protected] has been rejected
because you are not allowed to post to
http://hdf-forum.184993.n3.nabble.com/Very-poor-performance-of-pHDF5-when-using-single-shared-file-tp4026443p4026449.html
. Please contact the owner about permissions or visit the Nabble Support
forum.
"

What the hell... why does it say I should reply and then that I am not
allowed to post to my own thread???

Anyway, I tried to post the following information:

I did some experiments yesterday using the BlueWaters cluster. The
stripe count is limited there to 160. For runs with 256 MPI
processes/cores and fixed datasets were the writing times:

separate files: 1.36 [s]
single file, 1 stripe: 133.6 [s]
single file, best result: 17.2 [s]

(I did multiple runs with various combinations of strip count and size,
presenting the best results I have obtained.)

Increasing the number of stripes obviously helped a lot, but comparing
with the separate-files strategy, the writing time is still more than
ten times slower . Do you think it is "normal"?

Might chunking help here?

Thanks,
Daniel



Dne 30. 8. 2013 16:05, Daniel Langr napsal(a):

I've run some benchmark, where within an MPI program, each process wrote
3 plain 1D arrays to 3 datasets of an HDF5 file. I've used the following
writing strategies:

1) each process writes to its own file,
2) each process writes to the same file to its own dataset,
3) each process writes to the same file to a same dataset.

I've tested 1)-3) for both fixed/chunked datasets (chunk size 1024), and
I've tested 2)-3) for both independent/collective options of the MPI
driver. I've also used 3 different clusters for measurements (all quite
modern).

As a result, the running (storage) times of the same-file strategy, i.e.
2) and 3), were of orders of magnitudes longer than the running times of
the separate-files strategy. For illustration:

cluster #1, 512 MPI processes, each process stores 100 MB of data, fixed
data sets:

1) separate files: 2.73 [s]
2) single file, independent calls, separate data sets: 88.54[s]

cluster #2, 256 MPI processes, each process stores 100 MB of data,
chunked data sets (chunk size 1024):

1) separate files: 10.40 [s]
2) single file, independent calls, shared data sets: 295 [s]
3) single file, collective calls, shared data sets: 3275 [s]

Any idea why the single-file strategy gives so poor writing performance?

Daniel

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Re: [Hdf-forum] Very poor performance of pHDF5 when using single (shared) file

Reply via email to