The run is not consistent. I have manually test "mpirun -np 4 pw.x -i mos2.rlx.in" on compute-0-2 and rocks7 nodes and it is fine. However, with the script "srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in" I see some errors in the output file which results in abortion after 60 seconds.
The errors are about not finding some files. Although the config file uses absolute path for the intermediate files and files are existed, the errors sound bizarre. compute-0-2 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3387 ghatee 20 0 1930488 129684 8336 R 100.0 0.2 0:09.71 pw.x 3388 ghatee 20 0 1930476 129700 8336 R 99.7 0.2 0:09.68 pw.x rocks7 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5592 ghatee 20 0 1930568 127764 8336 R 100.0 0.2 0:17.29 pw.x 549 ghatee 20 0 116844 3652 1804 S 0.0 0.0 0:00.14 bash As you can see, 2 tasks are fine on compute-0-2, but there should be 4 tasks on rocks7. Input file contains outdir = "/home/ghatee/job/2h-unitcell" , pseudo_dir = "/home/ghatee/q-e-qe-5.4/pseudo/" , The output file says Program PWSCF v.6.2 starts on 28Mar2019 at 11:43:58 This program is part of the open-source Quantum ESPRESSO suite for quantum simulation of materials; please cite "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009); "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017); URL http://www.quantum-espresso.org", in publications or presentations arising from this work. More details at http://www.quantum-espresso.org/quote Parallel version (MPI), running on 1 processors MPI processes distributed on 1 nodes Reading input from mos2.rlx.in Warning: card &CELL ignored Warning: card CELL_DYNAMICS = "BFGS" ignored Warning: card PRESS_CONV_THR = 5.00000E-01 ignored Warning: card / ignored Program PWSCF v.6.2 starts on 28Mar2019 at 11:43:58 This program is part of the open-source Quantum ESPRESSO suite for quantum simulation of materials; please cite "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009); "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017); URL http://www.quantum-espresso.org", in publications or presentations arising from this work. More details at http://www.quantum-espresso.org/quote Parallel version (MPI), running on 1 processors MPI processes distributed on 1 nodes Reading input from mos2.rlx.in Warning: card &CELL ignored Warning: card CELL_DYNAMICS = "BFGS" ignored Warning: card PRESS_CONV_THR = 5.00000E-01 ignored Warning: card / ignored Current dimensions of program PWSCF are: Max number of different atomic species (ntypx) = 10 Max number of k-points (npk) = 40000 Max angular momentum in pseudopotentials (lmaxx) = 3 Current dimensions of program PWSCF are: Max number of different atomic species (ntypx) = 10 Max number of k-points (npk) = 40000 Max angular momentum in pseudopotentials (lmaxx) = 3 Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58 This program is part of the open-source Quantum ESPRESSO suite for quantum simulation of materials; please cite "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009); "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017); URL http://www.quantum-espresso.org", in publications or presentations arising from this work. More details at http://www.quantum-espresso.org/quote Parallel version (MPI), running on 1 processors MPI processes distributed on 1 nodes Reading input from mos2.rlx.in Warning: card &CELL ignored Warning: card CELL_DYNAMICS = "BFGS" ignored Warning: card PRESS_CONV_THR = 5.00000E-01 ignored Warning: card / ignored Current dimensions of program PWSCF are: Max number of different atomic species (ntypx) = 10 Max number of k-points (npk) = 40000 Max angular momentum in pseudopotentials (lmaxx) = 3 Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58 This program is part of the open-source Quantum ESPRESSO suite for quantum simulation of materials; please cite "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009); "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017); URL http://www.quantum-espresso.org", in publications or presentations arising from this work. More details at http://www.quantum-espresso.org/quote Parallel version (MPI), running on 1 processors MPI processes distributed on 1 nodes Reading input from mos2.rlx.in Warning: card &CELL ignored Warning: card CELL_DYNAMICS = "BFGS" ignored Warning: card PRESS_CONV_THR = 5.00000E-01 ignored Warning: card / ignored Current dimensions of program PWSCF are: Max number of different atomic species (ntypx) = 10 Max number of k-points (npk) = 40000 Max angular momentum in pseudopotentials (lmaxx) = 3 Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58 This program is part of the open-source Quantum ESPRESSO suite for quantum simulation of materials; please cite "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009); "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017); URL http://www.quantum-espresso.org", in publications or presentations arising from this work. More details at http://www.quantum-espresso.org/quote Program PWSCF v.6.2 starts on 28Mar2019 at 20:13:58 This program is part of the open-source Quantum ESPRESSO suite for quantum simulation of materials; please cite "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009); "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017); URL http://www.quantum-espresso.org", in publications or presentations arising from this work. More details at http://www.quantum-espresso.org/quote Parallel version (MPI), running on 1 processors MPI processes distributed on 1 nodes Parallel version (MPI), running on 1 processors MPI processes distributed on 1 nodes Reading input from mos2.rlx.in Reading input from mos2.rlx.in Warning: card &CELL ignored Warning: card CELL_DYNAMICS = "BFGS" ignored Warning: card &CELL ignored Warning: card CELL_DYNAMICS = "BFGS" ignored Warning: card PRESS_CONV_THR = 5.00000E-01 ignored Warning: card / ignored Warning: card PRESS_CONV_THR = 5.00000E-01 ignored Warning: card / ignored Current dimensions of program PWSCF are: Max number of different atomic species (ntypx) = 10 Max number of k-points (npk) = 40000 Max angular momentum in pseudopotentials (lmaxx) = 3 Current dimensions of program PWSCF are: Max number of different atomic species (ntypx) = 10 Max number of k-points (npk) = 40000 Max angular momentum in pseudopotentials (lmaxx) = 3 file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized file Mo.revpbe-spn-rrkjus_psl.0.3.0.UPF: wavefunction(s) 4S renormalized ERROR(FoX) Cannot open file ERROR(FoX) Cannot open file %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Error in routine read_ncpp (2): pseudo file is empty or wrong %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% stopping ... -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. ... ... ... Verifying that there are 6 " Parallel version (MPI), running on 1 processors" lines, it seems that it starts normally as I specified in the slurm script. However, I am suspecious that the program is NOT multicore MPI job. It is 6 instances of a serial run and there may be some races during the run. Any thought? Regards, Mahmood On Thu, Mar 28, 2019 at 3:59 PM Frava <fravad...@gmail.com> wrote: > I didn't receive the last mail from Mahmood but Marcus is right, Mahmood's > heterogeneous job submission seems to be working now. > Well, separating each pack in the srun command and asking for the correct > number of tasks to be launched for each pack is the way I figured the > heterogeneous jobs worked with SLURM v18.08.0 (I didn't test it with more > recent SLURM versions). > >