To run EPW, the command for running the preliminary nscf run is (http://epw.org.uk/Documentation/B-dopedDiamond):
~/bin/openmpi-v3.0/bin/mpiexec -np 64 /home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > nscf.out So I submitted it with the following command: ~/bin/openmpi-v3.0/bin/mpiexec --mca io romio314 -np 64 /home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > nscf.out And it crashed like the first time. It is interesting that the preliminary scf run works fine. The scf run requires Quantum Espresso to generate the k points automatically as shown below: K_POINTS (automatic) 12 12 12 0 0 0 The nscf run which crashes includes a list of k points (1728 in this case) as seen below: K_POINTS (crystal) 1728 0.00000000 0.00000000 0.00000000 5.787037e-04 0.00000000 0.00000000 0.08333333 5.787037e-04 0.00000000 0.00000000 0.16666667 5.787037e-04 0.00000000 0.00000000 0.25000000 5.787037e-04 0.00000000 0.00000000 0.33333333 5.787037e-04 0.00000000 0.00000000 0.41666667 5.787037e-04 0.00000000 0.00000000 0.50000000 5.787037e-04 0.00000000 0.00000000 0.58333333 5.787037e-04 0.00000000 0.00000000 0.66666667 5.787037e-04 0.00000000 0.00000000 0.75000000 5.787037e-04 ……. ……. To build openmpi (either 1.10.7 or 3.0.x), I loaded the fortran compiler module, configured with only the “--prefix=" and then “make all install”. I did not enable or disable any other options. Cheers, Vahid On Jan 19, 2018, at 10:23 AM, Edgar Gabriel <egabr...@central.uh.edu<mailto:egabr...@central.uh.edu>> wrote: thanks, that is interesting. Since /scratch is a lustre file system, Open MPI should actually utilize romio314 for that anyway, not ompio. What I have seen however happen on at least one occasions is that ompio was still used since ( I suspect) romio314 didn't pick up correctly the configuration options. It is a little bit of a mess from that perspective that we have to pass the romio arguments with different flag/options than for ompio, e.g. --with-lustre=/path/to/lustre/ --with-io-romio-flags="--with-file-system=ufs+nfs+lustre --with-lustre=/path/to/lustre" ompio should pick up the lustre options correctly if lustre headers/libraries are found at the default location, even if the user did not pass the --with-lustre option. I am not entirely sure what happens in romio if the user did not pass the --with-file-system=ufs+nfs+lustre but the lustre headers/libraries are found at the default location, i.e. whether the lustre adio component is still compiled or not. Anyway, lets wait for the outcome of your run enforcing using the romio314 component, and I will still try to reproduce your problem on my system. Thanks Edgar On 1/19/2018 7:15 AM, Vahid Askarpour wrote: Gilles, I have submitted that job with --mca io romio314. If it finishes, I will let you know. It is sitting in Conte’s queue at Purdue. As to Edgar’s question about the file system, here is the output of df -Th: vaskarpo@conte-fe00:~ $ df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 ext4 435G 16G 398G 4% / tmpfs tmpfs 16G 1.4M 16G 1% /dev/shm persistent-nfs.rcac.purdue.edu<http://persistent-nfs.rcac.purdue.edu>:/persistent/home nfs 80T 64T 17T 80% /home persistent-nfs.rcac.purdue.edu<http://persistent-nfs.rcac.purdue.edu>:/persistent/apps nfs 8.0T 4.0T 4.1T 49% /apps mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD<mailto:mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD> lustre 1.4P 994T 347T 75% /scratch/conte depotint-nfs.rcac.purdue.edu<http://depotint-nfs.rcac.purdue.edu>:/depot nfs 4.5P 3.0P 1.6P 66% /depot 172.18.84.186:/persistent/fsadmin nfs 200G 130G 71G 65% /usr/rmt_share/fsadmin The code is compiled in my $HOME and is run on the scratch. Cheers, Vahid On Jan 18, 2018, at 10:14 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com><mailto:gilles.gouaillar...@gmail.com> wrote: Vahid, i the v1.10 series, the default MPI-IO component was ROMIO based, and in the v3 series, it is now ompio. You can force the latest Open MPI to use the ROMIO based component with mpirun --mca io romio314 ... That being said, your description (e.g. a hand edited file) suggests that I/O is not performed with MPI-IO, which makes me very puzzled on why the latest Open MPI is crashing. Cheers, Gilles On Fri, Jan 19, 2018 at 10:55 AM, Edgar Gabriel <egabr...@central.uh.edu><mailto:egabr...@central.uh.edu> wrote: I will try to reproduce this problem with 3.0.x, but it might take me a couple of days to get to it. Since it seemed to have worked with 2.0.x (except for the running out file handles problem), there is the suspicion that one of the fixes that we introduced since then is the problem. What file system did you run it on? NFS? Thanks Edgar On 1/18/2018 5:17 PM, Jeff Squyres (jsquyres) wrote: On Jan 18, 2018, at 5:53 PM, Vahid Askarpour <vh261...@dal.ca><mailto:vh261...@dal.ca> wrote: My openmpi3.0.x run (called nscf run) was reading data from a routine Quantum Espresso input file edited by hand. The preliminary run (called scf run) was done with openmpi3.0.x on a similar input file also edited by hand. Gotcha. Well, that's a little disappointing. It would be good to understand why it is crashing -- is the app doing something that is accidentally not standard? Is there a bug in (soon to be released) Open MPI 3.0.1? ...? _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users