Hi,
Can you give more info about the compilation steps, I just recompiled it
(using the internal stuff except for fftw) and was able to run an
example (output below). Did I miss something ?
I recompiled / ran on a Platform OCS 5 cluster (based on RHEL 5), with
IB support (OFED)
Partial ompi_info :
Open MPI: 1.2.6
Open MPI SVN revision: r17946
Open RTE: 1.2.6
Open RTE SVN revision: r17946
OPAL: 1.2.6
OPAL SVN revision: r17946
Prefix: /home/mbozzore/openmpi
Configured architecture: x86_64-unknown-linux-gnu
Configured by: mbozzore
Configured on: Mon Aug 11 00:29:15 EDT 2008
Configure host: tyan04.lsf.platform.com
Built by: mbozzore
Built on: Mon Aug 11 00:33:54 EDT 2008
Built host: tyan04.lsf.platform.com
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: small
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fortran77 compiler: gfortran
Fortran77 compiler abs: /usr/bin/gfortran
Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/bin/gfortran
[mbozzore@tyan04 tests]$ mpirun -np 4 --machinefile ./hosts -x
LD_LIBRARY_PATH --mca btl openib,self ../bin/pw.x < scf.in
Program PWSCF v.4.0.1 starts ...
Today is 15Aug2008 at 14:51:18
Parallel version (MPI)
Number of processors in use: 4
R & G space division: proc/pool = 4
For Norm-Conserving or Ultrasoft (Vanderbilt) Pseudopotentials or
PAW
Current dimensions of program pwscf are:
Max number of different atomic species (ntypx) = 10
Max number of k-points (npk) = 40000
Max angular momentum in pseudopotentials (lmaxx) = 3
Iterative solution of the eigenvalue problem
a parallel distributed memory algorithm will be used,
eigenstates matrixes will be distributed block like on
ortho sub-group = 2* 2 procs
Planes per process (thick) : nr3 = 16 npp = 4 ncplane = 256
Proc/ planes cols G planes cols G columns G
Pool (dense grid) (smooth grid) (wavefct grid)
1 4 41 366 4 41 366 13 70
2 4 41 366 4 41 366 14 71
3 4 40 362 4 40 362 14 71
4 4 41 365 4 41 365 14 71
tot 16 163 1459 16 163 1459 55 283
bravais-lattice index = 2
lattice parameter (a_0) = 10.2000 a.u.
unit-cell volume = 265.3020 (a.u.)^3
number of atoms/cell = 2
number of atomic types = 1
number of electrons = 8.00
number of Kohn-Sham states= 4
kinetic-energy cutoff = 12.0000 Ry
charge density cutoff = 48.0000 Ry
convergence threshold = 1.0E-06
mixing beta = 0.7000
number of iterations used = 8 plain mixing
Exchange-correlation = SLA PZ NOGX NOGC (1100)
celldm(1)= 10.200000 celldm(2)= 0.000000 celldm(3)= 0.000000
celldm(4)= 0.000000 celldm(5)= 0.000000 celldm(6)= 0.000000
crystal axes: (cart. coord. in units of a_0)
a(1) = ( -0.500000 0.000000 0.500000 )
a(2) = ( 0.000000 0.500000 0.500000 )
a(3) = ( -0.500000 0.500000 0.000000 )
reciprocal axes: (cart. coord. in units 2 pi/a_0)
b(1) = ( -1.000000 -1.000000 1.000000 )
b(2) = ( 1.000000 1.000000 1.000000 )
b(3) = ( -1.000000 1.000000 -1.000000 )
PseudoPot. # 1 for Si read from file Si.vbc.UPF
Pseudo is Norm-conserving, Zval = 4.0
Generated by new atomic code, or converted to UPF format
Using radial grid of 431 points, 2 beta functions with:
l(1) = 0
l(2) = 1
atomic species valence mass pseudopotential
Si 4.00 28.08600 Si( 1.00)
48 Sym.Ops. (with inversion)
Cartesian axes
site n. atom positions (a_0 units)
1 Si tau( 1) = ( 0.0000000 0.0000000
0.0000000 )
2 Si tau( 2) = ( 0.2500000 0.2500000
0.2500000 )
number of k points= 2
cart. coord. in units 2pi/a_0
k( 1) = ( 0.2500000 0.2500000 0.2500000), wk =
0.5000000
k( 2) = ( 0.2500000 0.2500000 0.7500000), wk =
1.5000000
G cutoff = 126.4975 ( 1459 G-vectors) FFT grid: ( 16, 16,
16)
Largest allocated arrays est. size (Mb) dimensions
Kohn-Sham Wavefunctions 0.00 Mb ( 51, 4)
NL pseudopotentials 0.01 Mb ( 51, 8)
Each V/rho on FFT grid 0.02 Mb ( 1024)
Each G-vector array 0.00 Mb ( 366)
G-vector shells 0.00 Mb ( 42)
Largest temporary arrays est. size (Mb) dimensions
Auxiliary wavefunctions 0.01 Mb ( 51, 16)
Each subspace H/S matrix 0.00 Mb ( 16, 16)
Each <psi_i|beta_j> matrix 0.00 Mb ( 8, 4)
Arrays for rho mixing 0.13 Mb ( 1024, 8)
Initial potential from superposition of free atoms
starting charge 7.99901, renormalised to 8.00000
Starting wfc are 8 atomic wfcs
total cpu time spent up to now is 0.10 secs
per-process dynamical memory: 21.9 Mb
Self-consistent Calculation
iteration # 1 ecut= 12.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 1.00E-02, avg # of iterations = 2.0
Threshold (ethr) on eigenvalues was too large:
Diagonalizing with lowered threshold
Davidson diagonalization with overlap
ethr = 7.93E-04, avg # of iterations = 1.0
total cpu time spent up to now is 0.13 secs
total energy = -15.79103983 Ry
Harris-Foulkes estimate = -15.81239602 Ry
estimated scf accuracy < 0.06375741 Ry
iteration # 2 ecut= 12.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 7.97E-04, avg # of iterations = 1.0
total cpu time spent up to now is 0.15 secs
total energy = -15.79409517 Ry
Harris-Foulkes estimate = -15.79442220 Ry
estimated scf accuracy < 0.00230261 Ry
iteration # 3 ecut= 12.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 2.88E-05, avg # of iterations = 2.0
total cpu time spent up to now is 0.17 secs
total energy = -15.79447768 Ry
Harris-Foulkes estimate = -15.79450039 Ry
estimated scf accuracy < 0.00006345 Ry
iteration # 4 ecut= 12.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 7.93E-07, avg # of iterations = 2.0
total cpu time spent up to now is 0.19 secs
total energy = -15.79449472 Ry
Harris-Foulkes estimate = -15.79449644 Ry
estimated scf accuracy < 0.00000455 Ry
iteration # 5 ecut= 12.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 5.69E-08, avg # of iterations = 2.5
total cpu time spent up to now is 0.21 secs
End of self-consistent calculation
k = 0.2500 0.2500 0.2500 ( 180 PWs) bands (ev):
-4.8701 2.3792 5.5371 5.5371
k = 0.2500 0.2500 0.7500 ( 186 PWs) bands (ev):
-2.9165 -0.0653 2.6795 4.0355
! total energy = -15.79449556 Ry
Harris-Foulkes estimate = -15.79449558 Ry
estimated scf accuracy < 0.00000005 Ry
The total energy is the sum of the following terms:
one-electron contribution = 4.83378726 Ry
hartree contribution = 1.08428951 Ry
xc contribution = -4.81281375 Ry
ewald contribution = -16.89975858 Ry
convergence has been achieved in 5 iterations
entering subroutine stress ...
total stress (Ry/bohr**3) (kbar) P=
-30.30
-0.00020597 0.00000000 0.00000000 -30.30 0.00
0.00
0.00000000 -0.00020597 0.00000000 0.00 -30.30
0.00
0.00000000 0.00000000 -0.00020597 0.00 0.00
-30.30
Writing output data file pwscf.save
PWSCF : 0.28s CPU time, 0.39s wall time
init_run : 0.05s CPU
electrons : 0.11s CPU
stress : 0.00s CPU
Called by init_run:
wfcinit : 0.01s CPU
potinit : 0.00s CPU
Called by electrons:
c_bands : 0.09s CPU ( 6 calls, 0.015 s avg)
sum_band : 0.01s CPU ( 6 calls, 0.001 s avg)
v_of_rho : 0.00s CPU ( 6 calls, 0.001 s avg)
mix_rho : 0.00s CPU ( 6 calls, 0.000 s avg)
Called by c_bands:
init_us_2 : 0.00s CPU ( 28 calls, 0.000 s avg)
cegterg : 0.09s CPU ( 12 calls, 0.007 s avg)
Called by *egterg:
h_psi : 0.01s CPU ( 35 calls, 0.000 s avg)
g_psi : 0.00s CPU ( 21 calls, 0.000 s avg)
cdiaghg : 0.06s CPU ( 31 calls, 0.002 s avg)
Called by h_psi:
add_vuspsi : 0.00s CPU ( 35 calls, 0.000 s avg)
General routines
calbec : 0.00s CPU ( 37 calls, 0.000 s avg)
cft3s : 0.02s CPU ( 354 calls, 0.000 s avg)
davcio : 0.00s CPU ( 40 calls, 0.000 s avg)
Parallel routines
fft_scatter : 0.01s CPU ( 354 calls, 0.000 s avg)
Mehdi Bozzo-Rey <mailto:[email protected]>
Open Source Solution Developer
Platform OCS5
<http://www.platform.com/Products/platform-open-cluster-stack5>
Platform computing
Phone: +1 905 948 4649
From: [email protected] [mailto:[email protected]] On
Behalf Of C.Y. Lee
Sent: August-15-08 1:03 PM
To: [email protected]
Subject: [OMPI users] Segmentation fault (11) Address not mapped (1)
All,
I had a similar problem as James described in an earlier message:
http://www.open-mpi.org/community/lists/users/2008/07/6204.php
While he was able to recompile openmpi to solve the problem, I had no
luck with my RedHat Enterprise 5 system.
Here are two other threads with similar issues regarding openmpi on
Ubuntu and OSX which were solved:
https://bugs.launchpad.net/ubuntu/+source/binutils/+bug/234837
http://www.somewhereville.com/?cat=55
Now...
Here is my story:
I had Quantum Espresso (QE) running without problem using openmpi.
However, when I tried to recompile QE with a recompiled fftw-2.1.5, it
compiled without any error. But when I ran QE, it gave me the error
below:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x22071b70
[ 0] /lib64/libpthread.so.0 [0x352420de70]
[ 1] /usr/lib64/liblapack.so.3(dsytf2_+0xc43) [0x2aaaaac9f5e3]
[ 2] /usr/lib64/liblapack.so.3(dsytrf_+0x407) [0x2aaaaaca0567]
[ 3] /opt/espresso-4.0.1/bin/pw.x(mix_rho_+0x828) [0x5044b8]
[ 4] /opt/espresso-4.0.1/bin/pw.x(electrons_+0xb37) [0x4eae47]
[ 5] /opt/espresso-4.0.1/bin/pw.x(MAIN__+0xbf) [0x42b3af]
[ 6] /opt/espresso-4.0.1/bin/pw.x(main+0xe) [0x6aad5e]
[ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x352361d8a4]
[ 8] /opt/espresso-4.0.1/bin/pw.x [0x42b239]
*** End of error message ***
>From what I read from the above links, it seems to be a bug in openmpi.
Please share your thoughts on this, thank you!
CY