On 23/12/2010 10:01 PM, Wojtyczka, André wrote:
Dear Gromacs Enthusiasts.
I am experiencing problems with mdrun_mpi (4.5.3) on a Nehalem cluster.
Problem:
This runs fine:
mpiexec -np 72 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
This produces a segmentation fault:
mpiexec -np 128 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
Unless you know you need it, don't use -pd. DD will be faster and is
probably better bug-tested too.
Mark
So the only difference is the number of cores I am using.
mdrun_mpi was compiled using the intel compiler 11.1.072 with my own fftw3
installation.
While configuring and make mdrun / make install-mdrun no errors came
up.
Is there some issue with threading or mpi?
If someone has a clue please give me a hint.
integrator = md
dt = 0.004
nsteps = 25000000
nstxout = 0
nstvout = 0
nstlog = 250000
nstenergy = 250000
nstxtcout = 12500
xtc_grps = protein
energygrps = protein non-protein
nstlist = 2
ns_type = grid
rlist = 0.9
coulombtype = PME
rcoulomb = 0.9
fourierspacing = 0.12
pme_order = 4
ewald_rtol = 1e-5
rvdw = 0.9
pbc = xyz
periodic_molecules = yes
tcoupl = nose-hoover
nsttcouple = 1
tc-grps = protein non-protein
tau_t = 0.1 0.1
ref_t = 310 310
Pcoupl = no
gen_vel = yes
gen_temp = 310
gen_seed = 173529
constraints = all-bonds
Error:
Getting Loaded...
Reading file full031K_mdrun_ions.tpr, VERSION 4.5.3 (single precision)
Loaded with Money
NOTE: The load imbalance in PME FFT and solve is 48%.
For optimal PME load balancing
PME grid_x (144) and grid_y (144) should be divisible by #PME_nodes_x
(128)
and PME grid_y (144) and grid_z (144) should be divisible by
#PME_nodes_y (1)
Step 0, time 0 (ps)
PSIlogger: Child with rank 82 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 79 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 2 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 1 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 100 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 97 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 98 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 96 exited on signal 6: Aborted
...
Ps, for now I don't care about the imbalanced PME load unless it's independent
from my problem.
Cheers
André
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists