I saw that redmine report, which could be related but it seems to happen only for runs done outside the domain and particle decompositions.
I'll fill up a red mine. Anything I could do to help speeding the fix? On May 2, 2013, at 11:53 AM, Mark Abraham <mark.j.abra...@gmail.com> wrote: > On Wed, May 1, 2013 at 10:24 PM, XAvier Periole <x.peri...@rug.nl> wrote: > >> >> Ok here is my current status on that REMD issue. >> >> For info: I use >> Temperature: v-rescale, tau_t = 2.0 ps >> Pressure: berendsen, tau_p = 5.0 ps, >> time step: dt=0.002 - 0.020 fs, >> COM removal on for bilayer/water separately >> >> The symptoms: explosion of the system after 2-5 steps following the swap, >> first sign is a huge jump in LJ interactions and pressure. This jump seems >> to be absorbed by the box size and temperature when possible … see example >> I provided earlier. A large VCM (velocity centre of mass?) is often >> associated with the crash. But also pressure scaling more than 1% ... >> >> 1- the problem mentioned above remains in gmx-4.5.7 and it might actually >> got worse. I was able to run a 500 ns simulation with gmx405 using similar >> setup as for gmx457. The following point happened in gmx457. >> 2- it persists with a time step of 2 fs. Actually all tests performed in >> the following used dt=2fs. >> 3- if I perform an exchange that explodes within mdrun myself (externally >> to the remd gromacs by getting the gro file with the mdp adjusting the >> temperature) it goes all fine. >> 4- the issue gets much worst when the consecutive replicas differ >> (different protein conformations and the box size etc) … explosion at first >> exchange. >> 5- the use of parrinelo-raman does not help >> 6- cancelling the centre of mass removal does not remove the problem. >> 7- switching to NVT ensemble does not help but makes it worst (crash in 2 >> steps). All exchanges accepted at first attempt crash with the message >> "Large VCM(group SOL): -0.0XXX , -0.XXX, -0.16XXX, Temp-cm:6.55XXX >> 8- using a unique conformation (the same) for all replicas in the NVT REMD >> simulation after equilibration in the same NVT ensemble (for 1 ns) removes >> the problem. >> 9- taking the equilibrated NVT conformations, equilibrate them in an NPT >> ensemble (1 ns) and let go the exchanges afterwards restores the problem … >> one exchange is not properly done at the second trial, while the first ones >> were fine. Well if errors were made that was with reasonable >> 10- note also that the coarse grain I use is extremely forgiving, meaning >> you can perform really nasty transformations and run it further after >> simple minimisation … so even abrupt changes in temperatures should be fine >> and relax quickly. >> 11- when looking at the conformations themselves nothing appears to have >> jumped over or nothing funky. >> >> At this point I am not sure what to think and what to do next. There is >> definitely something not going right during the exchanges. >> > > OK, thanks for the effort. That all agrees with my suspicion that the full > state is not being exchanged. > > Anyone has been able to run a REMD simulation in an NPT ensemble without >> crashes? I would imagine someone has and something particular to my system >> is making it going wrong but I am really wondering what it could be. My >> feeling is that something relative to the box size or pressure is not going >> across but it might be something completely different, when the consecutive >> systems differ reasonably. >> > > I've never tried, but an experiment with a water box might be instructive. > > However that would suggest that the manner the exchanges are made is >> severely wrong in some cases. >> >> Any help to resolve the problem would be greatly appreciated. >> > > There is an outstanding REMD issue on redmine that could be related ( > http://redmine.gromacs.org/issues/1191). I'd suggest you open a new issue > there, upload a minimal set of .tprs that can reproduce the problem and > anything you can think of that might help investigate. For something I'm > doing, I'd like to be sure the full T-coupling state is being exchanged, > and we may as well kill all the bugs at once. > > Mark > > > XAvier. >> >> On Apr 26, 2013, at 9:21 AM, Mark Abraham <mark.j.abra...@gmail.com> >> wrote: >> >>> On Thu, Apr 25, 2013 at 11:05 PM, XAvier Periole <x.peri...@rug.nl> >> wrote: >>> >>>> >>>> Thanks for the answer. I'll check gmx4.5.7 and report back. >>>> >>>> I am not sure what you mean by GROMACS swaps the coordinates not the >>>> ensemble data. The coupling to P and T and not exchanged with it? >>> >>> >>> The code in src/kernel/repl_ex.c: >>> >>> static void exchange_state(const gmx_multisim_t *ms, int b, t_state >> *state) >>> { >>> /* When t_state changes, this code should be updated. */ >>> int ngtc, nnhpres; >>> ngtc = state->ngtc * state->nhchainlength; >>> nnhpres = state->nnhpres* state->nhchainlength; >>> exchange_rvecs(ms, b, state->box, DIM); >>> exchange_rvecs(ms, b, state->box_rel, DIM); >>> exchange_rvecs(ms, b, state->boxv, DIM); >>> exchange_reals(ms, b, &(state->veta), 1); >>> exchange_reals(ms, b, &(state->vol0), 1); >>> exchange_rvecs(ms, b, state->svir_prev, DIM); >>> exchange_rvecs(ms, b, state->fvir_prev, DIM); >>> exchange_rvecs(ms, b, state->pres_prev, DIM); >>> exchange_doubles(ms, b, state->nosehoover_xi, ngtc); >>> exchange_doubles(ms, b, state->nosehoover_vxi, ngtc); >>> exchange_doubles(ms, b, state->nhpres_xi, nnhpres); >>> exchange_doubles(ms, b, state->nhpres_vxi, nnhpres); >>> exchange_doubles(ms, b, state->therm_integral, state->ngtc); >>> exchange_rvecs(ms, b, state->x, state->natoms); >>> exchange_rvecs(ms, b, state->v, state->natoms); >>> exchange_rvecs(ms, b, state->sd_X, state->natoms); >>> } >>> >>> I mis-stated last night - there *is* exchange of ensemble data, but it is >>> incomplete. In particular, state->ekinstate is not exchanged. Probably it >>> is incomplete because the 9-year-old comment about t_state changing is >> in a >>> location that nobody changing t_state will see. And serializing a >> complex C >>> data structure over MPI is tedious at best. But that is not really an >>> excuse for the non-modularity GROMACS has for many of its key data >>> structures. We are working on various workflow and actual code structure >>> improvements to fix/prevent issues like this, but the proliferation of >>> algorithms that ought to be inter-operable makes the job pretty hard. >>> >>> Other codes seem to exchange the ensemble label data (e.g. reference >>> temperatures for T-coupling) because they write trajectories that are >>> continuous with respect to atomic coordinates. I plan to move REMD in >>> GROMACS to this approach, because it scales better, but it will not >> happen >>> any time soon. >>> >>> That would explain what I see, but let see what 4.5.7 has to say first. >>>> >>> >>> Great. It may be that there were other issues in 4.5.3 that exacerbated >> any >>> REMD problem. >>> >>> Mark >>> >>> Tks. >>>> >>>> On Apr 25, 2013, at 22:40, Mark Abraham <mark.j.abra...@gmail.com> >> wrote: >>>> >>>>> Thanks for the good report. There have been some known issues about the >>>>> timing of coupling stages with respect to various intervals between >>>> GROMACS >>>>> events for some algorithms. There are a lot of fixed problems in 4.5.7 >>>> that >>>>> are not specific to REMD, but I have a few lingering doubts about >> whether >>>>> we should be exchanging (scaled) coupling values along with the >>>>> coordinates. (Unlike most REMD implementations, GROMACS swaps the >>>>> coordinates, not the ensemble data.) If you can reproduce those kinds >> of >>>>> symptoms in 4.5.7 (whether or not they then crash) then there looks >> like >>>>> there may be a problem with the REMD implementation that is perhaps >>>> evident >>>>> only with the kind of large time step Martini takes? >>>>> >>>>> Mark >>>>> >>>>> >>>>> On Thu, Apr 25, 2013 at 1:28 PM, XAvier Periole <x.peri...@rug.nl> >>>> wrote: >>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have been recently using the REMD code in gmx-407 and gmx-453 and >> got >>>> a >>>>>> few systems crashing for unclear reasons so far. The main tests I made >>>> are >>>>>> using gmx407 but it is all reproducible with gmx453. The crashing was >>>> also >>>>>> reproduced (not necessarily at the same time point) on several >>>>>> architectures. >>>>>> >>>>>> The system is made of a pair of proteins in a membrane patch and for >>>> which >>>>>> the relative orientation is controlled by non-native >>>> bond/angles/dihedrals >>>>>> to perform an umbrella sampling. I use the MARTINI force field but >> that >>>>>> might not be relevant here. >>>>>> >>>>>> The crashes occur following exchanges that do not seem to occur the >>>>>> correct way and preceded by pressure scaling warnings … indicative of >> a >>>>>> strong destabilisation of the system and eventual explosion. Some >>>>>> information seems to be exchanged inaccurately. >>>>>> >>>>>> Trying to nail down the problem I got stuck and may be some one can >>>> help. >>>>>> I placed a pdf file showing plots of bonded/nonbonded energies, >>>>>> temperatures, box size etc … around an exchange that does not lead to >> a >>>>>> crash (here: md.chem.rug.nl/~periole/remd-issue.pdf). I plotted stuff >>>>>> every step with the temperature colour coded as indicated in the first >>>>>> figure. >>>>>> >>>>>> From the figure it appears that the step right after the exchange >> there >>>> is >>>>>> a huge jump of Potential energy coming from the LJ(SR) part of it. >>>> Although >>>>>> there are some small discontinuities in the progression of the bond >> and >>>>>> angle energy around the exchange they seem to fine. The temperature >> and >>>> box >>>>>> size seem to respond to it a few step latter while the pressure seems >>>> to be >>>>>> affected right away but potentially as the Epot will affect the viral >>>> and >>>>>> thus the Pressure. >>>>>> >>>>>> The other potential clue is that the jumps reduce with the strength of >>>> the >>>>>> pressure coupling. A 1/2 ps tau_p (Berendsen) will lead to a crash >>>> while a >>>>>> 5/10/20 ps won't. Inspection of the time evolution of the Epot, box … >>>>>> indicates that the magnitude of the jumps is reduced and the system ca >>>>>> handle the problem. >>>>>> >>>>>> One additional info since I first posted the problem (delayed by the >>>> file >>>>>> first attached but now given with a link) the problem is accentuated >>>> when >>>>>> the replicas differ in conformation. I am looking at the actual >>>> differences >>>>>> as you'll read this email. >>>>>> >>>>>> That is as far as I could go. Any suggestion is welcome. >>>>>> >>>>>> XAvier. >>>>>> MD-Group / Univ. of Groningen >>>>>> The Netherlands-- >>>>>> gmx-users mailing list gmx-users@gromacs.org >>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users >>>>>> * Please search the archive at >>>>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >>>>>> * Please don't post (un)subscribe requests to the list. Use the >>>>>> www interface or send it to gmx-users-requ...@gromacs.org. >>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>>> -- >>>>> gmx-users mailing list gmx-users@gromacs.org >>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users >>>>> * Please search the archive at >>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >>>>> * Please don't post (un)subscribe requests to the list. Use the >>>>> www interface or send it to gmx-users-requ...@gromacs.org. >>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>> -- >>>> gmx-users mailing list gmx-users@gromacs.org >>>> http://lists.gromacs.org/mailman/listinfo/gmx-users >>>> * Please search the archive at >>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >>>> * Please don't post (un)subscribe requests to the list. Use the >>>> www interface or send it to gmx-users-requ...@gromacs.org. >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>> >>> -- >>> gmx-users mailing list gmx-users@gromacs.org >>> http://lists.gromacs.org/mailman/listinfo/gmx-users >>> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >>> * Please don't post (un)subscribe requests to the list. Use the >>> www interface or send it to gmx-users-requ...@gromacs.org. >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> -- >> gmx-users mailing list gmx-users@gromacs.org >> http://lists.gromacs.org/mailman/listinfo/gmx-users >> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >> * Please don't post (un)subscribe requests to the list. Use the >> www interface or send it to gmx-users-requ...@gromacs.org. >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists