On Mon, Aug 26, 2013 at 06:15:30AM +0000, Biddiscombe, John A. wrote: > Rob, > > Did you make any significant discoveries/progress regarding the GPFS tweaks > on BG systems. Our machine will be open for use within the next week or so > and I'd like to begin some profiling. I'd be interested in knowing if you > have discovered any useful facts that I ought to know about.
An upcoming driver update (I don't know which one) will allow the Blue Gene compute nodes to send the gpfs_fcntl commands all the way through to the GPFS file system (presently the gpfs_fcntl commands return "not supported". Then, we can do some experiments to see if they still provide any benefit at Blue Gene scales (the optimizations are 15 years old at this point, designed when "massively parallel system" was 32 nodes. More generally, I've found that some of the default MPI-IO settings are probably not ideal for /Q, and have tested/suggested a change to the "number of I/O aggregators" defaults. Meanwhile, ALCF (the folks who operate the machine) have been working with IBM to improve the state of collective I/O. Seems like we're making some progress there as well. > I'm concerned about how much the --enable-gpfs option is able to > 'know' about the system (can we easily find out what the option > does?). According to my superficial understanding of the BG > architecture, it seems that since the compute nodes have IO calls > forwarded off to the IO nodes by kernel level routines, collective > operations performed by hdf5 might actually reduce the effectiveness > of the IO by forcing the data to be shuffled around twice instead of > once. Am I thinking along the right lines? The --enable-gpfs option will attempt to do a few things: gpfs_access_range gpfs_free_range This is the "multiple access range" hint, which tells GPFS "hey, don't grab a lock on the whole file. instead, just these sections". I *think* this is going to be one of the better improvements remaining. gpfs_clear_file_cache gpfs_invalidate_file_cache Good for benchmarking. Ejects all entries from the gpfs page pool. gpfs_cancel_hints just resets things gpfs_start_data_shipping gpfs_start_data_ship_map gpfs_stop_data_shipping Unfortunately, GPFS-3.5 does not support data shipping any longer. I still think these hints need to be implemented in the MPI-IO library, if they still help at all, but if one is being pragmatic one might more easily deploy the hints through HDF5. ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
