On Fri, May 22, 2015 at 9:08 AM, Wolf Dapp <[email protected]> wrote:

> Dear forum members,
>
> I have observed an annoying occurence many times now. I'm running
> parallel HDF5 (1.8.14) on top of OpenMPI (1.7.2) with gcc (4.8.1) on a
> OpenSuse Linux (13.1). The storage is located on a NFS Server.
>
> Running on typically 4 cores, I'm writing relatively large files (at
> least several hundred MB, sometimes many GB) in parallel with HDF5.
> Sometimes I have to interrupt the code with a CTRL+C signal during such
> a write operation (often because of user error). Occasionally, this will
> cause a catastrophic hangup, and I get the error message:
> kernel BUG: soft lockup - CPU stuck for 23s!
>

Have you seen <
http://lists.opensuse.org/archive/opensuse-bugs/2014-06/msg01135.html>?
What kernel version are you running?



>
> This will invariably cause a violent system crash after a very short
> time. I have observed this on at least 5 different machines (same
> software stack), and so I don't believe it is a hardware problem. Since
> these lockups only happen during interrupted write operations, I suspect
> the HDF5 library to be causing them in some way, possibly not freeing
> some resources.
>

A "kernel BUG" needs to be fixed in the kernel, but of course some
kernel bugs are triggered by buggy user/library code.


>
> Of course, it could also be caused by OpenMPI. Due to the highly
> disruptive nature of the problem, I am not keen to try it too often. I
> cannot easily try a different (or newer) MPI implementation. It might
> also be caused by the fact that I'm not writing to a physical drive, but
> a NFS drive.
>
> Hence a general question, without appending example code: Has anyone
> observed this behavior before, and if so, is there a fix? Am I blaming
> HDF5 unfairly, and another cause is more likely? If this error is
> unheard of, it's most likely caused by my setup...
>
> Thanks,
> Wolf


> --
>
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>



-- 
George N. White III <[email protected]>
Head of St. Margarets Bay, Nova Scotia
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to