On Fri, May 22, 2015 at 9:08 AM, Wolf Dapp <[email protected]> wrote:
> Dear forum members, > > I have observed an annoying occurence many times now. I'm running > parallel HDF5 (1.8.14) on top of OpenMPI (1.7.2) with gcc (4.8.1) on a > OpenSuse Linux (13.1). The storage is located on a NFS Server. > > Running on typically 4 cores, I'm writing relatively large files (at > least several hundred MB, sometimes many GB) in parallel with HDF5. > Sometimes I have to interrupt the code with a CTRL+C signal during such > a write operation (often because of user error). Occasionally, this will > cause a catastrophic hangup, and I get the error message: > kernel BUG: soft lockup - CPU stuck for 23s! > Have you seen < http://lists.opensuse.org/archive/opensuse-bugs/2014-06/msg01135.html>? What kernel version are you running? > > This will invariably cause a violent system crash after a very short > time. I have observed this on at least 5 different machines (same > software stack), and so I don't believe it is a hardware problem. Since > these lockups only happen during interrupted write operations, I suspect > the HDF5 library to be causing them in some way, possibly not freeing > some resources. > A "kernel BUG" needs to be fixed in the kernel, but of course some kernel bugs are triggered by buggy user/library code. > > Of course, it could also be caused by OpenMPI. Due to the highly > disruptive nature of the problem, I am not keen to try it too often. I > cannot easily try a different (or newer) MPI implementation. It might > also be caused by the fact that I'm not writing to a physical drive, but > a NFS drive. > > Hence a general question, without appending example code: Has anyone > observed this behavior before, and if so, is there a fix? Am I blaming > HDF5 unfairly, and another cause is more likely? If this error is > unheard of, it's most likely caused by my setup... > > Thanks, > Wolf > -- > > > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 > -- George N. White III <[email protected]> Head of St. Margarets Bay, Nova Scotia
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
