Hi,
Can you give more info about the compilation steps, I just recompiled it (using the internal stuff except for fftw) and was able to run an example (output below). Did I miss something ? I recompiled / ran on a Platform OCS 5 cluster (based on RHEL 5), with IB support (OFED) Partial ompi_info : Open MPI: 1.2.6 Open MPI SVN revision: r17946 Open RTE: 1.2.6 Open RTE SVN revision: r17946 OPAL: 1.2.6 OPAL SVN revision: r17946 Prefix: /home/mbozzore/openmpi Configured architecture: x86_64-unknown-linux-gnu Configured by: mbozzore Configured on: Mon Aug 11 00:29:15 EDT 2008 Configure host: tyan04.lsf.platform.com Built by: mbozzore Built on: Mon Aug 11 00:33:54 EDT 2008 Built host: tyan04.lsf.platform.com C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran [mbozzore@tyan04 tests]$ mpirun -np 4 --machinefile ./hosts -x LD_LIBRARY_PATH --mca btl openib,self ../bin/pw.x < scf.in Program PWSCF v.4.0.1 starts ... Today is 15Aug2008 at 14:51:18 Parallel version (MPI) Number of processors in use: 4 R & G space division: proc/pool = 4 For Norm-Conserving or Ultrasoft (Vanderbilt) Pseudopotentials or PAW Current dimensions of program pwscf are: Max number of different atomic species (ntypx) = 10 Max number of k-points (npk) = 40000 Max angular momentum in pseudopotentials (lmaxx) = 3 Iterative solution of the eigenvalue problem a parallel distributed memory algorithm will be used, eigenstates matrixes will be distributed block like on ortho sub-group = 2* 2 procs Planes per process (thick) : nr3 = 16 npp = 4 ncplane = 256 Proc/ planes cols G planes cols G columns G Pool (dense grid) (smooth grid) (wavefct grid) 1 4 41 366 4 41 366 13 70 2 4 41 366 4 41 366 14 71 3 4 40 362 4 40 362 14 71 4 4 41 365 4 41 365 14 71 tot 16 163 1459 16 163 1459 55 283 bravais-lattice index = 2 lattice parameter (a_0) = 10.2000 a.u. unit-cell volume = 265.3020 (a.u.)^3 number of atoms/cell = 2 number of atomic types = 1 number of electrons = 8.00 number of Kohn-Sham states= 4 kinetic-energy cutoff = 12.0000 Ry charge density cutoff = 48.0000 Ry convergence threshold = 1.0E-06 mixing beta = 0.7000 number of iterations used = 8 plain mixing Exchange-correlation = SLA PZ NOGX NOGC (1100) celldm(1)= 10.200000 celldm(2)= 0.000000 celldm(3)= 0.000000 celldm(4)= 0.000000 celldm(5)= 0.000000 celldm(6)= 0.000000 crystal axes: (cart. coord. in units of a_0) a(1) = ( -0.500000 0.000000 0.500000 ) a(2) = ( 0.000000 0.500000 0.500000 ) a(3) = ( -0.500000 0.500000 0.000000 ) reciprocal axes: (cart. coord. in units 2 pi/a_0) b(1) = ( -1.000000 -1.000000 1.000000 ) b(2) = ( 1.000000 1.000000 1.000000 ) b(3) = ( -1.000000 1.000000 -1.000000 ) PseudoPot. # 1 for Si read from file Si.vbc.UPF Pseudo is Norm-conserving, Zval = 4.0 Generated by new atomic code, or converted to UPF format Using radial grid of 431 points, 2 beta functions with: l(1) = 0 l(2) = 1 atomic species valence mass pseudopotential Si 4.00 28.08600 Si( 1.00) 48 Sym.Ops. (with inversion) Cartesian axes site n. atom positions (a_0 units) 1 Si tau( 1) = ( 0.0000000 0.0000000 0.0000000 ) 2 Si tau( 2) = ( 0.2500000 0.2500000 0.2500000 ) number of k points= 2 cart. coord. in units 2pi/a_0 k( 1) = ( 0.2500000 0.2500000 0.2500000), wk = 0.5000000 k( 2) = ( 0.2500000 0.2500000 0.7500000), wk = 1.5000000 G cutoff = 126.4975 ( 1459 G-vectors) FFT grid: ( 16, 16, 16) Largest allocated arrays est. size (Mb) dimensions Kohn-Sham Wavefunctions 0.00 Mb ( 51, 4) NL pseudopotentials 0.01 Mb ( 51, 8) Each V/rho on FFT grid 0.02 Mb ( 1024) Each G-vector array 0.00 Mb ( 366) G-vector shells 0.00 Mb ( 42) Largest temporary arrays est. size (Mb) dimensions Auxiliary wavefunctions 0.01 Mb ( 51, 16) Each subspace H/S matrix 0.00 Mb ( 16, 16) Each <psi_i|beta_j> matrix 0.00 Mb ( 8, 4) Arrays for rho mixing 0.13 Mb ( 1024, 8) Initial potential from superposition of free atoms starting charge 7.99901, renormalised to 8.00000 Starting wfc are 8 atomic wfcs total cpu time spent up to now is 0.10 secs per-process dynamical memory: 21.9 Mb Self-consistent Calculation iteration # 1 ecut= 12.00 Ry beta=0.70 Davidson diagonalization with overlap ethr = 1.00E-02, avg # of iterations = 2.0 Threshold (ethr) on eigenvalues was too large: Diagonalizing with lowered threshold Davidson diagonalization with overlap ethr = 7.93E-04, avg # of iterations = 1.0 total cpu time spent up to now is 0.13 secs total energy = -15.79103983 Ry Harris-Foulkes estimate = -15.81239602 Ry estimated scf accuracy < 0.06375741 Ry iteration # 2 ecut= 12.00 Ry beta=0.70 Davidson diagonalization with overlap ethr = 7.97E-04, avg # of iterations = 1.0 total cpu time spent up to now is 0.15 secs total energy = -15.79409517 Ry Harris-Foulkes estimate = -15.79442220 Ry estimated scf accuracy < 0.00230261 Ry iteration # 3 ecut= 12.00 Ry beta=0.70 Davidson diagonalization with overlap ethr = 2.88E-05, avg # of iterations = 2.0 total cpu time spent up to now is 0.17 secs total energy = -15.79447768 Ry Harris-Foulkes estimate = -15.79450039 Ry estimated scf accuracy < 0.00006345 Ry iteration # 4 ecut= 12.00 Ry beta=0.70 Davidson diagonalization with overlap ethr = 7.93E-07, avg # of iterations = 2.0 total cpu time spent up to now is 0.19 secs total energy = -15.79449472 Ry Harris-Foulkes estimate = -15.79449644 Ry estimated scf accuracy < 0.00000455 Ry iteration # 5 ecut= 12.00 Ry beta=0.70 Davidson diagonalization with overlap ethr = 5.69E-08, avg # of iterations = 2.5 total cpu time spent up to now is 0.21 secs End of self-consistent calculation k = 0.2500 0.2500 0.2500 ( 180 PWs) bands (ev): -4.8701 2.3792 5.5371 5.5371 k = 0.2500 0.2500 0.7500 ( 186 PWs) bands (ev): -2.9165 -0.0653 2.6795 4.0355 ! total energy = -15.79449556 Ry Harris-Foulkes estimate = -15.79449558 Ry estimated scf accuracy < 0.00000005 Ry The total energy is the sum of the following terms: one-electron contribution = 4.83378726 Ry hartree contribution = 1.08428951 Ry xc contribution = -4.81281375 Ry ewald contribution = -16.89975858 Ry convergence has been achieved in 5 iterations entering subroutine stress ... total stress (Ry/bohr**3) (kbar) P= -30.30 -0.00020597 0.00000000 0.00000000 -30.30 0.00 0.00 0.00000000 -0.00020597 0.00000000 0.00 -30.30 0.00 0.00000000 0.00000000 -0.00020597 0.00 0.00 -30.30 Writing output data file pwscf.save PWSCF : 0.28s CPU time, 0.39s wall time init_run : 0.05s CPU electrons : 0.11s CPU stress : 0.00s CPU Called by init_run: wfcinit : 0.01s CPU potinit : 0.00s CPU Called by electrons: c_bands : 0.09s CPU ( 6 calls, 0.015 s avg) sum_band : 0.01s CPU ( 6 calls, 0.001 s avg) v_of_rho : 0.00s CPU ( 6 calls, 0.001 s avg) mix_rho : 0.00s CPU ( 6 calls, 0.000 s avg) Called by c_bands: init_us_2 : 0.00s CPU ( 28 calls, 0.000 s avg) cegterg : 0.09s CPU ( 12 calls, 0.007 s avg) Called by *egterg: h_psi : 0.01s CPU ( 35 calls, 0.000 s avg) g_psi : 0.00s CPU ( 21 calls, 0.000 s avg) cdiaghg : 0.06s CPU ( 31 calls, 0.002 s avg) Called by h_psi: add_vuspsi : 0.00s CPU ( 35 calls, 0.000 s avg) General routines calbec : 0.00s CPU ( 37 calls, 0.000 s avg) cft3s : 0.02s CPU ( 354 calls, 0.000 s avg) davcio : 0.00s CPU ( 40 calls, 0.000 s avg) Parallel routines fft_scatter : 0.01s CPU ( 354 calls, 0.000 s avg) Mehdi Bozzo-Rey <mailto:mbozz...@platform.com> Open Source Solution Developer Platform OCS5 <http://www.platform.com/Products/platform-open-cluster-stack5> Platform computing Phone: +1 905 948 4649 From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of C.Y. Lee Sent: August-15-08 1:03 PM To: us...@open-mpi.org Subject: [OMPI users] Segmentation fault (11) Address not mapped (1) All, I had a similar problem as James described in an earlier message: http://www.open-mpi.org/community/lists/users/2008/07/6204.php While he was able to recompile openmpi to solve the problem, I had no luck with my RedHat Enterprise 5 system. Here are two other threads with similar issues regarding openmpi on Ubuntu and OSX which were solved: https://bugs.launchpad.net/ubuntu/+source/binutils/+bug/234837 http://www.somewhereville.com/?cat=55 Now... Here is my story: I had Quantum Espresso (QE) running without problem using openmpi. However, when I tried to recompile QE with a recompiled fftw-2.1.5, it compiled without any error. But when I ran QE, it gave me the error below: *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: 0x22071b70 [ 0] /lib64/libpthread.so.0 [0x352420de70] [ 1] /usr/lib64/liblapack.so.3(dsytf2_+0xc43) [0x2aaaaac9f5e3] [ 2] /usr/lib64/liblapack.so.3(dsytrf_+0x407) [0x2aaaaaca0567] [ 3] /opt/espresso-4.0.1/bin/pw.x(mix_rho_+0x828) [0x5044b8] [ 4] /opt/espresso-4.0.1/bin/pw.x(electrons_+0xb37) [0x4eae47] [ 5] /opt/espresso-4.0.1/bin/pw.x(MAIN__+0xbf) [0x42b3af] [ 6] /opt/espresso-4.0.1/bin/pw.x(main+0xe) [0x6aad5e] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x352361d8a4] [ 8] /opt/espresso-4.0.1/bin/pw.x [0x42b239] *** End of error message *** >From what I read from the above links, it seems to be a bug in openmpi. Please share your thoughts on this, thank you! CY