On Thu, Oct 10, 2013 at 2:34 PM, James <jamesresearch...@gmail.com> wrote:
> Dear Mark, > > Thanks again for your response. > > Many of the regression tests seem to have passed: > > All 16 simple tests PASSED > All 19 complex tests PASSED > All 142 kernel tests PASSED > All 9 freeenergy tests PASSED > All 0 extra tests PASSED > Error not all 42 pdb2gmx tests have been done successfully > Only 0 energies in the log file > pdb2gmx tests FAILED > > I'm not sure why pdb2gmx failed but I suppose it will not impact the > crashing I'm experiencing. > No, that's fine. Probably they don't have sufficiently explicit guards to stop people running the energy minimization with a more-than-useful number of OpenMP threads. > Regarding the stack trace showing line numbers, what is the best way to go > about this, in this context? I'm not really experienced in that aspect. > That's a matter of compiling in debug mode (use cmake .. -DCMAKE_BUILD_TYPE=Debug), and hopefully observing the same crash with an error message that has more useful information. The debug mode annotates the executable so that a finger can be pointed at the code line that caused the segfault. Hopefully the compiler does this properly, but support for this in OpenMP is a corner compiler writers might cut ;-) Depending on the details, loading a core dump in a debugger can also be necessary, but your local sysadmins are the people to talk to there. Mark Thanks again for your help! > > Best regards, > > James > > > On 21 September 2013 23:12, Mark Abraham <mark.j.abra...@gmail.com> wrote: > > > On Sat, Sep 21, 2013 at 2:45 PM, James <jamesresearch...@gmail.com> > wrote: > > > Dear Mark and the rest of the Gromacs team, > > > > > > Thanks a lot for your response. I have been trying to isolate the > problem > > > and have also been in discussion with the support staff. They suggested > > it > > > may be a bug in the gromacs code, and I have tried to isolate the > problem > > > more precisely. > > > > First, do the GROMACS regression tests for Verlet kernels pass? (Run > > them all, but those with nbnxn prefix are of interest here.) They > > likely won't scale to 16 OMP threads, but you can vary OMP_NUM_THREADS > > environment variable to see what you can see. > > > > > Considering that the calculation is run under MPI with 16 OpenMP cores > > per > > > MPI node, the error seems to occur under the following conditions: > > > > > > A few thousand atoms: 1 or 2 MPI nodes: OK > > > Double the number of atoms (~15,000): 1 MPI node: OK, 2 MPI nodes: > > SIGSEGV > > > error described below. > > > > > > So it seems that the error occurs for relatively large systems which > use > > > MPI. > > > > ~500 atoms per core (thread) is a system in the normal GROMACS scaling > > regime. 16 OMP threads is more than is useful on other HPC systems, > > but since we don't know what your hardware is, whether you are > > investigating something useful is your decision. > > > > > The crash mentions the "calc_cell_indices" function (see below). Is > this > > > somehow a problem with memory not being sufficient at the MPI interface > > at > > > this function? I'm not sure how to proceed further. Any help would be > > > greatly appreciated. > > > > If there is a problem with GROMACS (which so far I doubt), we'd need a > > stack trace that shows a line number (rather than addresses) in order > > to start to locate it. > > > > Mark > > > > > Gromacs version is 4.6.3. > > > > > > Thank you very much for your time. > > > > > > James > > > > > > > > > On 4 September 2013 16:05, Mark Abraham <mark.j.abra...@gmail.com> > > wrote: > > > > > >> On Sep 4, 2013 7:59 AM, "James" <jamesresearch...@gmail.com> wrote: > > >> > > > >> > Dear all, > > >> > > > >> > I'm trying to run Gromacs on a Fujitsu supercomputer but the > software > > is > > >> > crashing. > > >> > > > >> > I run grompp: > > >> > > > >> > grompp_mpi_d -f parameters.mdp -c system.pdb -p overthe.top > > >> > > > >> > and it produces the error: > > >> > > > >> > jwe1050i-w The hardware barrier couldn't be used and continues > > processing > > >> > using the software barrier. > > >> > taken to (standard) corrective action, execution continuing. > > >> > error summary (Fortran) > > >> > error number error level error count > > >> > jwe1050i w 1 > > >> > total error count = 1 > > >> > > > >> > but still outputs topol.tpr so I can continue. > > >> > > >> There's no value in compiling grompp with MPI or in double precision. > > >> > > >> > I then run with > > >> > > > >> > export FLIB_FASTOMP=FALSE > > >> > source /home/username/Gromacs463/bin/GMXRC.bash > > >> > mpiexec mdrun_mpi_d -ntomp 16 -v > > >> > > > >> > but it crashes: > > >> > > > >> > starting mdrun 'testrun' > > >> > 50000 steps, 100.0 ps. > > >> > jwe0019i-u The program was terminated abnormally with signal number > > >> SIGSEGV. > > >> > signal identifier = SEGV_MAPERR, address not mapped to object > > >> > error occurs at calc_cell_indices._OMP_1 loc 0000000000233474 offset > > >> > 00000000000003b4 > > >> > calc_cell_indices._OMP_1 at loc 00000000002330c0 called from loc > > >> > ffffffff02088fa0 in start_thread > > >> > start_thread at loc ffffffff02088e4c called from loc > ffffffff029d19b4 > > in > > >> > __thread_start > > >> > __thread_start at loc ffffffff029d1988 called from o.s. > > >> > error summary (Fortran) > > >> > error number error level error count > > >> > jwe0019i u 1 > > >> > jwe1050i w 1 > > >> > total error count = 2 > > >> > [ERR.] PLE 0014 plexec The process terminated > > >> > > > >> > > >> > > > abnormally.(rank=1)(nid=0x03060006)(exitstatus=240)(CODE=2002,1966080,61440) > > >> > [ERR.] PLE The program that the user specified may be illegal or > > >> > inaccessible on the node.(nid=0x03060006) > > >> > > > >> > Any ideas what could be wrong? It works on my local intel machine. > > >> > > >> Looks like it wasn't compiled correctly for the target machine. What > was > > >> the cmake command, what does mdrun -version output? Also, if this is > > the K > > >> computer, probably we can't help, because the compiler docs are > > officially > > >> unavailable to us. National secret, and all ;-) > > >> > > >> Mark > > >> > > >> > > > >> > Thanks in advance, > > >> > > > >> > James > > >> > -- > > >> > gmx-users mailing list gmx-users@gromacs.org > > >> > http://lists.gromacs.org/mailman/listinfo/gmx-users > > >> > * Please search the archive at > > >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > >> > * Please don't post (un)subscribe requests to the list. Use the > > >> > www interface or send it to gmx-users-requ...@gromacs.org. > > >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > >> -- > > >> gmx-users mailing list gmx-users@gromacs.org > > >> http://lists.gromacs.org/mailman/listinfo/gmx-users > > >> * Please search the archive at > > >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > >> * Please don't post (un)subscribe requests to the list. Use the > > >> www interface or send it to gmx-users-requ...@gromacs.org. > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > >> > > > -- > > > gmx-users mailing list gmx-users@gromacs.org > > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > > * Please don't post (un)subscribe requests to the list. Use the > > > www interface or send it to gmx-users-requ...@gromacs.org. > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > -- > > gmx-users mailing list gmx-users@gromacs.org > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > * Please don't post (un)subscribe requests to the list. Use the > > www interface or send it to gmx-users-requ...@gromacs.org. > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists