On 3/26/13 11:13 PM, Christopher Neale wrote:
Dear Matthew:
Thank you for noticing the file size. This is a very good lead.
I had not noticed that this was special. Indeed, here is the complete listing
for truncated/corrupt .cpt files:
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:53 md3.cpt
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:54 md3.cpt
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:54 md3.cpt
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:54 md3.cpt
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:50 md3.cpt
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:50 md3.cpt
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:50 md3.cpt
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:51 md3.cpt
-rw-r----- 1 cneale cneale 1048576 Mar 26 18:51 md3.cpt
-rw-r----- 1 cneale cneale 2097152 Mar 26 18:52 md3.cpt
-rw-r----- 1 cneale cneale 2097152 Mar 26 18:52 md3.cpt
-rw-r----- 1 cneale cneale 2097152 Mar 26 18:52 md3.cpt
-rw-r----- 1 cneale cneale 2097152 Mar 26 18:52 md3.cpt
I will contact my sysadmins and let them know about your suggestions.
Nevertheless, I respectfully reject the idea that there is really nothing that
can be done about this inside
gromacs. About 6 years ago, I worked on a cluster with massive sporadic NSF
delays. The only solution to
automate runs on that machine was to, for example, use sed to create a .mdp
from a template .mdp file, which had ;;;EOF as the last line and then to poll
the created mdp file for ;;;EOF until it existed prior to running
grompp (at the time I was using mdrun -sort and desorting with an in-house
script prior to domain
decomposition, so I had to stop/start gromacs every coupld of hours). This is
not to say that such things are
ideal, but I think gromacs would be all the better if it was able to avoid
with problems like this regardless of
the cluster setup.
Please note that, over the years, I have seen this on 4 different clusters
(albeit with different versions of
gromacs), but that is to say that it's not just one setup that is to blame.
Matthew, please don't take my comments the wrong way. I deeply appreciate your
help. I just want to put it
out there that I believe that gromacs would be better if it didn't overwrite
good .cpt files with truncated/corrupt
.cpt files ever, even if the cluster catches on fire or the earth's magnetic
field reverses, etc.
Also, I suspect that sysadmins don't have a lot of time to test their clusters
for graceful exit upon chiller failure
conditions, so a super-careful regime of .cpt update will always be useful.
Thank you again for your help, I'll take it to my sysadmins, who are very good
and may be able to remedy
this on their cluster, but who knows what cluster I will be using in 5 years.
Perhaps this is a case where the -cpnum option would be useful? That may cause
a lot of checkpoint files to accumulate, depending on the length of the run, but
perhaps a scripted cleanup routine to preserve some subset of backups would be
useful.
-Justin
--
========================================
Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists