Andrei Neamtu wrote:
Dear gmx users,
I have problems in running a simulation on several nodes using the
-multi option:
I make the .tpr files for different temperatures (I want to use the REMD
grompp -f param0.mdp -po param0.out.mdp -c conf.gro -p -o
grompp -f param1.mdp -po param1.out.mdp -c conf.gro -p -o
grompp -f param2.mdp -po param2.out.mdp -c conf.gro -p -o
grompp -f paramN.mdp -po paramN.out.mdp -c conf.gro -p -o
where N is the number of nodes in my cluster (P4 with gigabit ethernet)
After the collection of input .tpr files are generated I start the
simulations with:
mpirun -np N mdrun_mpi -np N -multi -replex 2000 -s sim.tpr -o sim.trr
but the program stops saying that it cannot find the sim1.tpr, sim2.tpr
It do *FINDS* the sim0.tpr for the node where I start the simulation but
for the rest of nodes it doesn't.
Have you made sure the other nodes are using the same working directory
and/or its contents are being propagated properly? Each MPI process
tries to load a different .tpr and (at least) the correct one needs to
be accessible from any given node.
I tried to put the simulation on a multi core machine (2 cores) and it
works fine for N=2. But when I link 2 multicore machines the program
stops with the same message except that the first .tpr file not found is
the one corresponding to the second machine.
I saw on the list that there were similar problems in the past but I
didn't find any solution to them. I tried to soft link the mdrun_mpi
executable in the working directory but the problem persists.
That isn't the problem - finding the .tpr is.
gmx-users mailing list
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [EMAIL PROTECTED]
Can't post? Read