On 10/10/12 1:33 PM, Ladasky wrote:
Update:
Ladasky wrote
Justin Lemkul wrote
Random segmentation faults are really hard to debug. Can you resume the
run
using a checkpoint file? That would suggest maybe an MPI problem or
something
else external to Gromacs. Without a reproducible system and a debugging
backtrace, it's going to be hard to figure out where the problem is
coming from.
Thanks for that tip, Justin. I tried to resume one run which failed at
1.06 million cycles, and it WORKED. It proceeded all the way to the 2.50
million cycles that I designated. I now have two separate .trr files, but
I suppose they can be merged.
I don't know whether my crashes are random yet. I will try re-running
that simulation again from time zero, to see whether it segfaults at the
same place. If it doesn't, then I have a problem which may have nothing
to do with GROMACS.
I just tried exactly that, a re-run of the same structure. This time, it
ran without stopping, from time zero to 2.50 million cycles! No crash at
1.06 million cycles this time.
Unless GROMACS is using some random number generator which affects the
outcome of repeated simulations (and I think that the only time that random
number generation would be needed would be when initial velocities are
generated, which was done during the earlier equilibration step), I will
conclude that my simulation conditions are indeed acceptable, and that
sometimes the software just behaves badly.
There are plenty of things that can differ between runs (unless you've turned
off optimizations and are using the -reprod option), but in all practical sense,
they should not lead to random seg faults.
Is that a common occurrence?
Based on the fact that very few people post seg fault problems that are not
precipitated by actual crashes (i.e. LINCS warnings), I would say no. There is
no evidence yet to suggest what the real problem is, but until such time,
Gromacs is innocent until proven guilty ;)
I could write a script which just automatically restarts my simulations
provided that they 1) ran for a decent number of cycles and b) exited with a
segmentation fault error. I could then have the script check in after a few
minutes to make sure that they haven't crashed again, and soldier on.
That's an option. If you're running in a queue system, there may be
notification options if something goes wrong, as well.
-Justin
--
========================================
Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists