If the runs all finish successfully, then incorporating run continuations into your script is simple, but I believe the issue may be more the tendency of tpbconv to fail unpredictably - should the .edr file be even one frame shorter than the .trr file due to a crash, for instance, then tpbconv will not be successful and your script dies. Parsing out the relevant error messages to produce the information required (for the option -time in this example) is presumably possible and would solve the problem, but it's not a trivial thing to script. Of course, the timescale of MD runs means that occasional manual intervention isn't too great a chore, but it can be annoying to almost complete a tpbconv on a very long run, only to find that it's missing the last couple of .edr frames due to a failure to flush the buffer...
Alan. ----- Original Message ---- From: David van der Spoel <[EMAIL PROTECTED]> To: Discussion list for GROMACS users <gmx-users@gromacs.org> Sent: Monday, January 28, 2008 7:09:29 PM Subject: Re: [gmx-users] Checkpointing GROMACS jobs Steven Kirk wrote: > Hello, > > I have been using GROMACS for some very long (in wall clock terms) > simulations, and am curious as to how other users on this list solve the > problem of checkpointing long MD runs. It's a problem because of the > tendency of computational nodes in large HPC facilities (the more > processors, the more prevalent the problem, it seems) to keel over near > the end of a very time consuming run. Intermittent disk and scheduler > faults can also trigger such conditions. > > Checkpointing at the operating system level is very system-specific, and > occasionally compilers can produce executable 'dump' files that continue > from where your program left off, but I'm thinking that someone must > have automated this process directly using conventionally-compiled > GROMACS executables. > > Of course, it is possible to do an exact continuation from a crashed run > using .edr and trajectory (.trr) files by generating a new .tpr from the > last trajectory frame that had both position and velocity data. This > seems to be, by necessity, an entirely interactive process (unless > someone out there has a cool auto-restart script ..). > > I am thinking more in terms of 'proactive' checkpointing for long jobs, > by the following process: > > A script parses the desired .mdp file describing the user's MD run of T > timesteps, then asks the user how many sections (N) to split the run > into. The script will then auto-generate a shell script containing all > the necessary GROMACS commands to: > > * Generate a new .mdp file almost identical to the original, but with > the number of timesteps set to T/N. > > * Run N successive mdrun commands, where the output .trr and .edr files > from each short run using the modified .mdp file are used, to generate > an 'exact restart' .tpr file for the next 'mdrun' command, with the > appropriate continuation flag set. > > * Log (to a file) how many of the N partial runs have been completed, in > such a way that if the shell script containing the commands is > restarted, it will jump to the correct point in the sequence, restarting > from the most recently completed partial run. > > Has anyone else already solved this problem, or have a method > implementing some of the desirable properties above that I can then > extend to do exactly the things described above? > > Most queue system allow you to chain jobs, that is, let the next one start after the previous one finished. In PBS this is done alike qsub -Wdepend=afterok:prev_jobid combining this with a script to start the jobs you are all set. I presume you are aware of tpbconv -extend, or tpbconv -until ? -- David. ________________________________________________________________________ David van der Spoel, PhD, Assoc. Prof., Molecular Biophysics group, Dept. of Cell and Molecular Biology, Uppsala University. Husargatan 3, Box 596, 75124 Uppsala, Sweden phone: 46 18 471 4205 fax: 46 18 511 755 [EMAIL PROTECTED] [EMAIL PROTECTED] http://folding.bmc.uu.se ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ _______________________________________________ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ _______________________________________________ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php