On Thu, Oct 27, 2011 at 4:09 PM, Mark Abraham <mark.abra...@anu.edu.au> wrote: > On 28/10/2011 9:29 AM, Ben Reynwar wrote: >> >> I found a post on the devel list from a couple of weeks ago where a >> fix is given and it appears to work for me. >> The link is >> http://lists.gromacs.org/pipermail/gmx-developers/2011-October/005405.html >> >> The fix does not appear to have been incorporated into the 4.5.5 release. > > Yes, this is the bug in 4.5.5 to which I referred. Your original post didn't > state a GROMACS version, but your output suggested 4.5.4, else I'd have told > you this was the bug... instead everyone wasted some time :) > > Mark >
I thought I was using 4.5.4 too. It turns out I wasn't. Cheers, Ben >> >> On Tue, Oct 25, 2011 at 4:33 PM, Mark Abraham<mark.abra...@anu.edu.au> >> wrote: >>> >>> On 26/10/2011 6:06 AM, Szilárd Páll wrote: >>>> >>>> Hi, >>>> >>>> Firstly, you're not using the latest version and there might have been >>>> a fix for your issue in the 4.5.5 patch release. >>> >>> There was a bug in 4.5.5 that was not present in 4.5.4 that could have >>> produced such symptoms, but it was fixed without creating a Redmine >>> issue. >>> >>>> Secondly, you should check the http://redmine.gromacs.org bugtracker >>>> to see what bugs have been fixed in 4.5.5 (ideally the target version >>>> should tell). You can also just do a search for REMD and see what >>>> matching bugs (open or closed) are in the database: >>>> http://redmine.gromacs.org/search/index/gromacs?issues=1&q=REMD >>> >>> If the OP is right and this was with 4.5.4 and can be reproduced with >>> 4.5.5, >>> please do some testing (e.g. Do different parallel regimes produce the >>> same >>> symptoms? Can the individual replicas run in a non-REMD simulation?) and >>> file a Redmine issue with your observations and a small sample case. >>> >>> Mark >>> >>>> Cheers, >>>> -- >>>> Szilárd >>>> >>>> >>>> >>>> On Tue, Oct 25, 2011 at 8:04 PM, Ben Reynwar<b...@reynwar.net> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I'm getting errors in MPI_Allreduce what I restart an REMD simulation. >>>>> It has occurred every time I have attempted an REMD restart. >>>>> I'm posting here to check there's not something obviously wrong with >>>>> the way I'm doing the restart which is causing it. >>>>> >>>>> I restart an REMD run using: >>>>> >>>>> >>>>> >>>>> ----------------------------------------------------------------------------------------------------------------------------------------- >>>>> basedir=/scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_ >>>>> status=${basedir}/pshsp_andva_run1_.status >>>>> deffnm=${basedir}/pshsp_andva_run1_ >>>>> cpt=${basedir}/pshsp_andva_run0_.cpt >>>>> tpr=${basedir}/pshsp_andva_run0_.tpr >>>>> log=${basedir}/pshsp_andva_run1_0.log >>>>> n_procs=32 >>>>> >>>>> echo "about to check if log file exists" >>>>> if [ ! -e $log ]; then >>>>> echo "RUNNING"> $status >>>>> source /usr/share/modules/init/bash >>>>> module load intel-mpi >>>>> module load intel-mkl >>>>> module load gromacs >>>>> echo "Calling mdrun" >>>>> mpirun -np 32 mdrun-mpi -maxh 24 -multi 16 -replex 1000 -s $tpr >>>>> -cpi $cpt -deffnm $deffnm >>>>> retval=$? >>>>> if [ $retval != 0 ]; then >>>>> echo "ERROR"> $status >>>>> exit 1 >>>>> fi >>>>> echo "FINISHED"> $status >>>>> fi >>>>> exit 0 >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------------------------------------------ >>>>> >>>>> mdrun then gets stuck and doesn't output anything until it is >>>>> terminated by the queuing system. >>>>> Upon termination the following output is written to stderr. >>>>> >>>>> [cli_5]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, >>>>> rbuf=0x2b379c00b770, count=16, MPI_INT, MPI_SUM, MPI_COMM >>>>> _NULL) failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_31]: [cli_11]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, >>>>> rbuf=0x7f489806bf60, count=16, MPI_INT, MPI_SUM, MPI_COMM >>>>> _NULL) failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, >>>>> rbuf=0x7fd960002fc0, count=16, MPI_INT, MPI_SUM, MPI_COMM >>>>> _NULL) failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_7]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x754400, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_9]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x757230, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_27]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, >>>>> rbuf=0x7fb3cc02a450, count=16, MPI_INT, MPI_SUM, MPI_COMM >>>>> _NULL) failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_23]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x750970, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_21]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x7007b0, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_3]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x754360, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_29]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x756460, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_19]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, >>>>> rbuf=0x7f60a0066850, count=16, MPI_INT, MPI_SUM, MPI_COMM >>>>> _NULL) failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_17]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, >>>>> rbuf=0x7f4bdc07b690, count=16, MPI_INT, MPI_SUM, MPI_COMM >>>>> _NULL) failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_1]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x754430, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_15]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, >>>>> rbuf=0x7fc31407c830, count=16, MPI_INT, MPI_SUM, MPI_COMM >>>>> _NULL) failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_25]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x6e1830, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> [cli_13]: aborting job: >>>>> Fatal error in MPI_Allreduce: Invalid communicator, error stack: >>>>> MPI_Allreduce(1175): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x6c2430, >>>>> count=16, MPI_INT, MPI_SUM, MPI_COMM_NULL) >>>>> failed >>>>> MPI_Allreduce(1051): Null communicator >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_3.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_0.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_7.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_6.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_1.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_4.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_5.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_2.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_11.tpr, >>>>> VERSION 4.5.4 (sing >>>>> le precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_9.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_8.tpr, >>>>> VERSION 4.5.4 (singl >>>>> e precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_10.tpr, >>>>> VERSION 4.5.4 (sing >>>>> le precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_15.tpr, >>>>> VERSION 4.5.4 (sing >>>>> le precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_13.tpr, >>>>> VERSION 4.5.4 (sing >>>>> le precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_12.tpr, >>>>> VERSION 4.5.4 (sing >>>>> le precision) >>>>> Reading file >>>>> >>>>> /scr2/benreynwar/home-ben-sHSP-REMD-pshsp_andva_run1_/pshsp_andva_run0_14.tpr, >>>>> VERSION 4.5.4 (sing >>>>> le precision) >>>>> Terminated >>>>> >>>>> Cheers, >>>>> Ben >>>>> -- >>>>> gmx-users mailing list gmx-users@gromacs.org >>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users >>>>> Please search the archive at >>>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >>>>> Please don't post (un)subscribe requests to the list. Use the >>>>> www interface or send it to gmx-users-requ...@gromacs.org. >>>>> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>>> >>> -- >>> gmx-users mailing list gmx-users@gromacs.org >>> http://lists.gromacs.org/mailman/listinfo/gmx-users >>> Please search the archive at >>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >>> Please don't post (un)subscribe requests to the list. Use the www >>> interface >>> or send it to gmx-users-requ...@gromacs.org. >>> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> > > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > Please don't post (un)subscribe requests to the list. Use the www interface > or send it to gmx-users-requ...@gromacs.org. > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists