Hi All, hi Mark, Here are some more details. The outputs and error messages are attached at the end of the e-mail. After truncation I get the error message [1a], gromacs has problems with the checksum of the trr fles. After truncation the trajectories (xtc, trr) have the same length of 27752 frames [1b]. All the edr files have the same length of 277518 frames [1b]. The cpt files used after truncation have a step = 138762700 and t = 277525.400000 [1c]. Before truncation I got the error message [2], gromacs complains that the 32 subsystems are not compatible. Anyone a idea was is going wrong?
Thanks, Henri ====1a: AFTER TRUNCATION: ERROR MESSAGE Reading checkpoint file state1.cpt generated: Thu Jan 27 02:19:50 2011 #PME-nodes mismatch, current program: -1 checkpoint file: 0 Reading checkpoint file state2.cpt generated: Thu Jan 27 02:19:50 2011 #PME-nodes mismatch, current program: -1 checkpoint file: 0 Gromacs binary or parallel settings not identical to previous run. Continuation is exact, but is not guaranteed to be binary identical. ... ------------------------------------------------------- Program mdrun_mpi, VERSION 4.5.3 Source code file: checkpoint.c, line: 1767 Fatal error: Can't read 1048576 bytes of 'traj1.trr' to compute checksum. The file has been replaced or its contents has been modified. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors ------------------------------------------------------- ------------------------------------------------------- Program mdrun_mpi, VERSION 4.5.3 Source code file: checkpoint.c, line: 1767 Fatal error: Can't read 1048576 bytes of 'traj2.trr' to compute checksum. The file has been replaced or its contents has been modified. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors ------------------------------------------------------- Error on node 1, will try to stop all the nodes Halting parallel program mdrun_mpi on CPU 1 out of 32 gcq#307: "Good Music Saves your Soul" (Lemmy) [n030212:18418] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode -1 ====1b: AFTER TRUNCATION: XTC TRR $ gmxcheck -f traj0.xtc Checking file traj0.xtc Reading frame 0 time 0.000 # Atoms 224 Precision 0.001 (nm) Reading frame 27000 time 270000.000 Item #frames Timestep (ps) Step 27752 10 Time 27752 10 Lambda 0 Coords 27752 10 Velocities 0 Forces 0 Box 27752 10 ... $ gmxcheck -f traj31.xtc Checking file traj31.xtc Reading frame 0 time 0.000 # Atoms 224 Precision 0.001 (nm) Reading frame 27000 time 270000.000 Item #frames Timestep (ps) Step 27752 10 Time 27752 10 Lambda 0 Coords 27752 10 Velocities 0 Forces 0 Box 27752 10 $ gmxcheck -f traj0.trr Checking file traj0.trr trn version: GMX_trn_file (single precision) Reading frame 0 time 0.000 # Atoms 6647 Reading frame 27000 time 270000.000 Item #frames Timestep (ps) Step 27752 10 Time 27752 10 Lambda 27752 10 Coords 27752 10 Velocities 27752 10 Forces 0 Box 27752 10 $ gmxcheck -f traj1.trr Checking file traj1.trr trn version: GMX_trn_file (single precision) Reading frame 0 time 0.000 # Atoms 6647 Reading frame 27000 time 270000.000 Item #frames Timestep (ps) Step 27752 10 Time 27752 10 Lambda 27752 10 Coords 27752 10 Velocities 27752 10 Forces 0 Box 27752 10 ... $ gmxcheck -f traj31.trr Checking file traj31.trr trn version: GMX_trn_file (single precision) Reading frame 0 time 0.000 # Atoms 6647 Reading frame 27000 time 270000.000 Item #frames Timestep (ps) Step 27752 10 Time 27752 10 Lambda 27752 10 Coords 27752 10 Velocities 27752 10 Forces 0 Box 27752 10 $ eneconv -f ener0.edr Reading energy frame 0 time 0.000 Continue writing frames from t=0, step=0 Last energy frame read 138759 time 277518.000 iting frame time 276000 Last step written from ener0.edr: t 277518, step 138759000 Last frame written was at step 138759000, time 277518.000000 Wrote 138760 frames ... $ eneconv -f ener31.edr Reading energy frame 0 time 0.000 Continue writing frames from t=0, step=0 Last energy frame read 138759 time 277518.000 iting frame time 276000 Last step written from ener31.edr: t 277518, step 138759000 Last frame written was at step 138759000, time 277518.000000 Wrote 138760 frames ====1c: AFTER TRUNCATION: CPT state0.cpt: generation time = Thu Jan 27 02:19:50 2011 step = 138762700 t = 277525.400000 ... state31.cpt: generation time = Thu Jan 27 02:19:50 2011 step = 138762700 t = 277525.400000 $ gmxdump -cp state0.cpt|less GROMACS version = 4.5.3 GROMACS build time = Fri Dec 3 03:20:53 CET 2010 GROMACS build user = user@cluster GROMACS build machine = Linux 2.6.18-194.17.4.el5 x86_64 generating program = /opt/gromacs-4.5.3/bin/mdrun_mpi generation time = Thu Jan 27 02:19:50 2011 checkpoint file version = 12 generating host = n040407 #atoms = 6647 #T-coupling groups = 1 #Nose-Hoover T-chains = 0 #Nose-Hoover T-chains for barostat = 0 integrator = 0 simulation part # = 18 step = 138762700 t = 277525.400000 #PP-nodes = 1 dd_nc[x] = 1 dd_nc[y] = 1 dd_nc[z] = 1 #PME-only nodes = 0 state flags = 6594 ekin data flags = 0 energy history flags = 255 ====2: BEFORE TRUNCATION $ less md2.log Initializing Replica Exchange Repl There are 32 replicas: Multi-checking the number of atoms ... OK Multi-checking the integrator ... OK Multi-checking init_step+nsteps ... OK Multi-checking first exchange step: init_step/-replex ... first exchange step: init_step/-replex is not equal for all subsystems subsystem 0: 70425 subsystem 1: 70437 subsystem 2: 70437 subsystem 3: 70437 subsystem 4: 70437 subsystem 5: 70437 subsystem 6: 70437 subsystem 7: 70437 subsystem 8: 70437 subsystem 9: 70437 subsystem 10: 70437 subsystem 11: 70437 subsystem 12: 70437 subsystem 13: 70437 subsystem 14: 70437 subsystem 15: 70437 subsystem 16: 70425 subsystem 17: 70437 subsystem 18: 70437 subsystem 19: 70437 subsystem 20: 70437 subsystem 21: 70437 subsystem 22: 70437 subsystem 23: 70437 subsystem 24: 70425 subsystem 25: 70437 subsystem 26: 70437 subsystem 27: 70437 subsystem 28: 70437 subsystem 29: 70437 subsystem 30: 70437 subsystem 31: 70437 ------------------------------------------------------- Program mdrun_mpi, VERSION 4.5.3 Source code file: main.c, line: 189 Fatal error: The 32 subsystems are not compatible For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors ------------------------------------------------------- $ gmxdump -cp state0.cpt|less GROMACS version = 4.5.3 GROMACS build time = Fri Dec 3 03:20:53 CET 2010 GROMACS build user = user@cluster GROMACS build machine = Linux 2.6.18-194.17.4.el5 x86_64 generating program = /opt/gromacs-4.5.3/bin/mdrun_mpi generation time = Thu Jan 27 15:08:32 2011 checkpoint file version = 12 generating host = n040407 #atoms = 6647 #T-coupling groups = 1 #Nose-Hoover T-chains = 0 #Nose-Hoover T-chains for barostat = 0 integrator = 0 simulation part # = 19 step = 140849180 t = 281698.360000 #PP-nodes = 1 dd_nc[x] = 1 dd_nc[y] = 1 dd_nc[z] = 1 #PME-only nodes = 0 state flags = 6594 ekin data flags = 0 ... $ gmxcheck -f traj0.xtc Reading frame 0 time 0.000 # Atoms 224 Precision 0.001 (nm) Reading frame 28000 time 280000.000 Item #frames Timestep (ps) Step 28170 10 Time 28170 10 Lambda 0 Coords 28170 10 Velocities 0 Forces 0 Box 28170 10 $ gmxcheck -f traj1.xtc Reading frame 0 time 0.000 # Atoms 224 Precision 0.001 (nm) Reading frame 28000 time 280000.000 Item #frames Timestep (ps) Step 28175 10 Time 28175 10 Lambda 0 Coords 28175 10 Velocities 0 Forces 0 Box 28175 10 -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists