A year or so back, we changed to BeeGFS as well. There were some issues getting parrallel I/O setup. First thing you want to do is run the parrallel mpio test. I believe they can be found here: https://support.hdfgroup.org/HDF5/Tutor/pprog.html.
This will help you verify if your cluster has mpio setup correctly. If that doesn't work, you'll need to get in touch with the management group to fix that. Then you need to make sure you are using an HDF5 library that is configured to do parrallel I/O. I know there aren't a lot of specifics here, but it took me about two weeks of convincing to get my cluster management group to realize that things weren't working quite right. Once everything was setup, I was able to generate and write about 40 GB of data in around two minutes. On Tue, May 23, 2017 at 8:18 AM, Quincey Koziol <[email protected]> wrote: > Hi Jan, > > > On May 23, 2017, at 2:46 AM, Jan Oliver Oelerich < > [email protected]> wrote: > > > > Hello HDF users, > > > > I am using HDF5 through NetCDF and I recently changed my program so that > each MPI process writes its data directly to the output file as opposed to > the master process gathering the results and being the only one who does > I/O. > > > > Now I see that my program slows down file systems a lot (of the whole > HPC cluster) and I don't really know how to handle I/O. The file system is > a high throughput Beegfs system. > > > > My program uses a hybrid parallelization approach, i.e. work is split > into N MPI processes, each of which spawns M worker threads. Currently, I > write to the output file from each of the M*N threads, but the writing is > guarded by a mutex, so thread-safety shouldn't be a problem. Each writing > process is a complete `open file, write, close file` cycle. > > > > Each write is at a separate region of the HDF5 file, so no chunks are > shared among any two processes. The amount of data to be written per > process is 1/(M*N) times the size of the whole file. > > > > Shouldn't this be exactly how HDF5 + MPI is supposed to be used? What is > the `best practice` regarding parallel file access with HDF5? > > Yes, this is probably the correct way to operate, but generally > things are much better for this case when collective I/O operations are > used. Are you using collective or independent I/O? (Independent is the > default) > > Quincey > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > hdfgroup.org_mailman_listinfo_hdf-2Dforum-5Flists.hdfgroup.org&d=DwICAg&c= > clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=Rx9txIqgEINHtVDIDfXdIw&m= > lnwp4oSn3StCocEX3B_WwTydNuJ5oFX7VYl-Ei3bbpw&s=5GdG4kU-9hw-z8kHIDPj6- > WfvdQeASwtycyfNyQ1tn0&e= > Twitter: https://urldefense.proofpoint.com/v2/url?u=https-3A__ > twitter.com_hdf5&d=DwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r= > Rx9txIqgEINHtVDIDfXdIw&m=lnwp4oSn3StCocEX3B_WwTydNuJ5oFX7VYl-Ei3bbpw&s= > YAEy34105plaH2V5vqw54_wLbsigIZ__8F13hUdNgEQ&e= >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
