Hi, Mark,<o:p></o:p>

Thanks for the
reply! <o:p></o:p>

It seemed that I
got something messed up. At the beginning, I used ‘constraints = all-bonds’ and
‘domain decomposition’.
When the simulation scale to more than 2 processes, an error
like below will occur: <o:p></o:p>

####################<o:p></o:p>

Fatal error: There
is no domain decomposition for 6 nodes that is compatible with the given box
and a minimum cell size of 2.06375 nm<o:p></o:p>

Change the number
of nodes or mdrun option -rcon or -dds or your LINCS settings<o:p></o:p>

Look in the log
file for details on the domain decomposition<o:p></o:p>

####################<o:p></o:p>

<o:p>&nbsp;</o:p>

I refer to the
manual and found no answer. Then I turned to use ‘particle decomposition’, tried
all kind of method, including change mpich to lammpi, change Gromacs from V4.05
to V4.07,adjusting the mdp file (e.g. ‘constraints = hbonds’ or no PME), and 
none of these
take effect! I thought I have tried ‘constraints = hbonds’ with ‘domain 
decomposition’, at least with lammpi. <o:p></o:p>

However, when I tried ‘constraints
= hbonds’
and ‘domain decomposition’ under
mpich today, it scaled to more than 2 processes well! And now it also scaled
well under lammpi using ‘constraints
= hbonds’
and ‘domain decomposition’!<o:p></o:p>

<o:p>&nbsp;</o:p>

So, it seemed the key is ‘constraints
= hbonds’
for ‘domain decomposition’.<o:p></o:p>

<o:p>&nbsp;</o:p>

Of course, the simulation still crashed when using ‘particle decomposition’ 
with ‘constraints = hbonds or all-bonds’, and I
don’t know why.<o:p></o:p>

<o:p>&nbsp;</o:p>

I use double precision version and NTP ensemble to perform a PCA!<o:p></o:p> 
----- Original Message -----From: xho...@sohu.comdate: Tuesday, June 1, 2010 
11:53Subject: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” 
occurs when using the ‘particle decomposition’ option.To: gmx-users 
&lt;gmx-users@gromacs.org&gt;&gt; Hi, everyone of gmx-users,&gt; &gt; I met a 
problem when I use the ‘particle decomposition’ option &gt; in a NTP MD 
simulation of Engrailed Homeodomain (En) in CL- &gt; neutralized water box. It 
just crashed with an error “Fatal &gt; error in PMPI_Bcast: Other MPI error, 
error stack: …..”. &gt; However, I’ve tried the ‘domain decomposition’ and 
everything is &gt; ok! I use the Gromacs 4.05 and 4.07, the MPI lib is 
mpich2-&gt; 1.2.1p1. The system box size is 5.386(nm)3. The MDP file list as 
&gt; below:&gt; ########################################################&gt; 
title&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = En&gt; 
;cpp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = /lib/cpp&gt; 
;include&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = -I../top&gt; 
define&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = &gt; 
integrator&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = md&gt; 
dt&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 0.002&gt; 
nsteps&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 3000000&gt; 
nstxout&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 500&gt; 
nstvout&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 500&gt; 
nstlog&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 250&gt; 
nstenergy&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 250&gt; 
nstxtcout&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 500&gt; comm-&gt; 
mode&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = Linear&gt; 
nstcomm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 1&gt; &gt; 
;xtc_grps&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = Protein&gt; 
energygrps&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = protein non-protein&gt; &gt; 
nstlist&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 10&gt; 
ns_type&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = grid&gt; 
pbc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = xyz        ;default xyz&gt; 
;periodic_molecules&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = &gt; yes     ;default 
no&gt; 
rlist&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 1.0&gt; &gt; 
coulombtype&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = PME&gt; 
rcoulomb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 1.0&gt; 
vdwtype&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = Cut-off&gt; 
rvdw&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 1.4&gt; 
fourierspacing&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 
0.12&gt; 
fourier_nx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 0&gt; 
fourier_ny&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 0&gt; 
fourier_nz&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 0&gt; 
pme_order&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 4&gt; 
ewald_rtol&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 1e-5&gt; 
optimize_fft&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = yes&gt; &gt; 
tcoupl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = v-rescale&gt; 
tc_grps&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = protein non-protein&gt; 
tau_t&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 0.1&nbsp; 0.1&gt; 
ref_t&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 298&nbsp; 298&gt; 
Pcoupl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = Parrinello-Rahman&gt; 
pcoupltype&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = isotropic&gt; 
tau_p&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 0.5&gt; 
compressibility&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 
4.5e-5&gt; 
ref_p&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 1.0&gt; &gt; 
gen_vel&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = yes&gt; 
gen_temp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 298&gt; 
gen_seed&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 173529&gt; &gt; 
constraints&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = hbonds&gt; 
lincs_order&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 10&gt; ########################################################&gt; &gt; 
When I conduct MD using “nohup mpiexec -np 2 mdrun_dmpi -s &gt; 11_Trun.tpr -g 
12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e &gt; 12_NTPmd_ener.edr -cpo 
12_NTPstate.cpt &amp;”, everything is OK.&gt; &gt; Since the system doesn’t 
support more than 2 processes under &gt; ‘domain decomposition’ option, it took 
me about 30 days to &gt; calculate a 6ns trajectory. Then I decide to use the 
‘particle Why no more than 2? What GROMACS version? Why are you using double 
precision with temperature coupling?MPICH has known issues. Use OpenMPI.&gt; 
decomposition’ option. The command line is “nohup mpiexec -np 6 &gt; mdrun_dmpi 
-pd -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c &gt; 12_NTPmd.pdb -e 
12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &amp;”. And I &gt; got the crash in the 
nohup file like below:&gt; ####################&gt; Fatal error in PMPI_Bcast: 
Other MPI error, error stack:&gt; PMPI_Bcast(1302)......................: 
MPI_Bcast(buf=0x8fedeb0, &gt; count=60720, MPI_BYTE, root=0, MPI_COMM_WORLD) 
failed&gt; MPIR_Bcast(998).......................: &gt; 
MPIR_Bcast_scatter_ring_allgather(842): &gt; 
MPIR_Bcast_binomial(187)..............: &gt; 
MPIC_Send(41).........................: &gt; 
MPIC_Wait(513)........................: &gt; 
MPIDI_CH3I_Progress(150)..............: &gt; 
MPID_nem_mpich2_blocking_recv(948)....: &gt; 
MPID_nem_tcp_connpoll(1720)...........: &gt; 
state_commrdy_handler(1561)...........: &gt; 
MPID_nem_tcp_send_queued(127).........: writev to socket failed -&gt; Bad 
address&gt; rank 0 in job 25&nbsp; cluster.cn_52655&nbsp;&nbsp; caused &gt; 
collective abort of all ranks&gt; exit status of rank 0: killed by signal 9&gt; 
####################&gt; &gt; And the ends of the log file list as below:&gt; 
####################&gt; ……..&gt; ……..&gt; ……..&gt; ……..&gt; &nbsp;&nbsp; &gt; 
bQMMM&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 
FALSE&gt; &nbsp;&nbsp; &gt; 
QMconstraints&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0&gt; &nbsp;&nbsp; 
QMMMscheme&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0&gt; &nbsp;&nbsp; &gt; 
scalefactor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 1&gt; 
qm_opts:&gt; &nbsp;&nbsp; &gt; 
ngQM&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 = 0&gt; ####################&gt; &gt; I’ve search the gmx-users mail list and 
tried to adjust the md &gt; parameters, and no solution was found. The "mpiexec 
-np x" &gt; option doesn't work except when x=1. I did found that when the &gt; 
whole En protein is constrained using position restraints &gt; (define = 
-DPOSRES), the ‘particle decomposition’ option works. &gt; However this is not 
the kind of MD I want to conduct.&gt; &nbsp;&gt; Could anyone help me about 
this problem? And I also want to know &gt; how can I accelerate this kind of MD 
(long time simulation of &gt; small system) using Gromacs? Thinks a lot!&gt; 
&gt; (Further information about the simulated system: The system has &gt; one 
En protein (54 residues, 629 atoms), total 4848 spce waters, &gt; and 7 Cl- 
used to neutralize the system. The system has been &gt; minimized first. A 20ps 
MD is also performed for the waters and &gt; ions before EM.)This should be 
bread-and-butter with either decomposition up to at least 16 processors, for a 
correctly compiled GROMACS with a useful MPI library.Mark-- 
-- 
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to