Dear all, I am getting a segmentation fault error when using DFT+U calculations as below.
[n3511-027:2614825] 127 more processes have sent help message help-mpi-btl-openib.txt / error in device init [n3511-027:2614825] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [n3511-027:2614833:0:2614833] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffff80a1adb8) ==== backtrace (tid:2614833) ==== 0 /lib64/libucs.so.0(ucs_handle_error+0x2dc) [0x14bef008cedc] 1 /lib64/libucs.so.0(+0x2b0bc) [0x14bef008d0bc] 2 /lib64/libucs.so.0(+0x2b28a) [0x14bef008d28a] 3 /home/fs71766/waldhoer02/data/tools/vsc5/base_libs/install/openmpi-4.1.1/lib/libmpi.so.40(MPI_Bcast+0x58) [0x14bf04d4e598] 4 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_blacs_openmpi_lp64.so.2(MKLMPI_Bcast+0xdd) [0x14bf0c5a6dfd] 5 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(PB_CpgemmMPI+0x1097) [0x14bf0c1bbad7] 6 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdgemm_+0xda7) [0x14bf0c21ff07] 7 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdlaed1_+0x7cd) [0x14bf0bcc6dbd] 8 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdlaed0_+0x9a1) [0x14bf0bcc6551] 9 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pdstedc_+0x639) [0x14bf0bcd0f19] 10 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(mkl_pzheevd0_+0xf99) [0x14bf0bf377c9] 11 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(mkl_pzheevdm_+0xb99) [0x14bf0bf36379] 12 /gpfs/opt/sw/vsc4/VSC/x86_64/glibc-2.17/skylake/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_scalapack_lp64.so.2(pzheevd_+0x3ca) [0x14bf0bf3553a] 13 pw.x() [0xbb9d5a] 14 pw.x() [0xb9bb82] 15 pw.x() [0x71de4b] 16 pw.x() [0x5a03a9] 17 pw.x() [0x5a47c4] 18 pw.x() [0x412a44] 19 pw.x() [0x41caf9] 20 pw.x() [0x4f955b] 21 pw.x() [0x40688c] 22 pw.x() [0x4065cd] 23 /lib64/libc.so.6(__libc_start_main+0xe5) [0x14bf033d8d85] 24 pw.x() [0x40660e] ================================= Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x14bf033ecb4f in ??? #1 0x14bf04d4e598 in ompi_comm_invalid at ../../../../ompi/communicator/communicator.h:341 #2 0x14bf04d4e598 in PMPI_Bcast at /home/fs71766/waldhoer02/data/tools/vsc5/base_libs/build/openmpi-4.1.1/ompi/mpi/c/profile/pbcast.c:72 #3 0x14bf0c5a6dfc in ??? #4 0x14bf0c1bbad6 in ??? #5 0x14bf0c21ff06 in ??? #6 0x14bf0bcc6dbc in ??? #7 0x14bf0bcc6550 in ??? #8 0x14bf0bcd0f18 in ??? #9 0x14bf0bf377c8 in ??? #10 0x14bf0bf36378 in ??? #11 0x14bf0bf35539 in ??? #12 0xbb9d59 in __zhpev_module_MOD_pzheevd_drv at /home/fs71287/gentles/tools/sc-qe-7.3/LAXlib/zhpev_drv.f90:1562 #13 0xb9bb81 in laxlib_pcdiaghg_ at /home/fs71287/gentles/tools/sc-qe-7.3/LAXlib/cdiaghg.f90:587 #14 0x71de4a in pcegterg_ at /home/fs71287/gentles/tools/sc-qe-7.3/KS_Solvers/Davidson/cegterg.f90:944 #15 0x5a03a8 in diag_bands_k at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:1030 #16 0x5a03a8 in diag_bands_ at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:322 #17 0x5a47c3 in c_bands_ at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/c_bands.f90:132 #18 0x412a43 in electrons_scf_ at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/electrons.f90:689 #19 0x41caf8 in electrons_ at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/electrons.f90:192 #20 0x4f955a in run_pwscf_ at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/run_pwscf.f90:189 #21 0x40688b in pwscf at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/pwscf.f90:85 #22 0x4065cc in main at /home/fs71287/gentles/tools/sc-qe-7.3/PW/src/pwscf.f90:40 I have been using 128 atom supercells with DFT+U calculations, Using InGaAsSb. The version is qe-7.3. I have seen a few previous versions having similar problems, but I am not sure if their fixes will be a appropriate given the changes in the DFT+U codes. It seems to be a problem with loading the environment. I am using supercomputers with 128 processors. The input file is: &CONTROL calculation = 'vc-relax', disk_io = 'low', etot_conv_thr = 1d-05, forc_conv_thr = 0.001, outdir = './tmp_In0.0Ga1.0As0.75Sb0.25_4x4x4', prefix = 'In0.0Ga1.0As0.75Sb0.25_4x4x4', pseudo_dir = '/home/fs71287/gentles/data/pseudos/', restart_mode = 'from_scratch', verbosity = 'low', / &SYSTEM celldm(1) = 43.6265, degauss = 0.0001, ecutwfc = 100, ibrav = 2, lspinorb = .TRUE., nat = 128, nbnd = 1792, noncolin = .TRUE., ntyp = 20, occupations = 'smearing', / &ELECTRONS conv_thr = 1d-05, mixing_beta = 0.65, / &IONS / &CELL cell_dofree = 'ibrav', / ATOMIC_SPECIES As1 74.9216 As.pbe.NC-FR.standard.v0.4.UPF As0 74.9216 As.pbe.NC-FR.standard.v0.4.UPF As2 74.9216 As.pbe.NC-FR.standard.v0.4.UPF As3 74.9216 As.pbe.NC-FR.standard.v0.4.UPF As4 74.9216 As.pbe.NC-FR.standard.v0.4.UPF Ga1 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF Ga0 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF Ga2 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF Ga3 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF Ga4 69.7230 Ga.pbe.NC-FR.standard.v0.4.UPF In1 114.8180 In.pbe.NC-FR.standard.v0.4.UPF In0 114.8180 In.pbe.NC-FR.standard.v0.4.UPF In2 114.8180 In.pbe.NC-FR.standard.v0.4.UPF In3 114.8180 In.pbe.NC-FR.standard.v0.4.UPF In4 114.8180 In.pbe.NC-FR.standard.v0.4.UPF Sb1 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF Sb0 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF Sb2 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF Sb3 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF Sb4 121.7600 Sb.pbe.NC-FR.standard.v0.4.UPF ATOMIC_POSITIONS crystal Ga4 0.000000 0.000000 0.000000 As4 0.062500 0.062500 0.062500 Ga3 0.000000 0.000000 0.250000 As4 0.062500 0.062500 0.312500 Ga4 0.000000 0.000000 0.500000 As4 0.062500 0.062500 0.562500 Ga4 0.000000 0.000000 0.750000 As4 0.062500 0.062500 0.812500 Ga4 0.000000 0.250000 0.000000 As4 0.062500 0.312500 0.062500 Ga4 0.000000 0.250000 0.250000 As4 0.062500 0.312500 0.312500 Ga2 0.000000 0.250000 0.500000 Sb4 0.062500 0.312500 0.562500 Ga2 0.000000 0.250000 0.750000 As4 0.062500 0.312500 0.812500 Ga4 0.000000 0.500000 0.000000 As4 0.062500 0.562500 0.062500 Ga4 0.000000 0.500000 0.250000 As4 0.062500 0.562500 0.312500 Ga1 0.000000 0.500000 0.500000 Sb4 0.062500 0.562500 0.562500 Ga3 0.000000 0.500000 0.750000 As4 0.062500 0.562500 0.812500 Ga4 0.000000 0.750000 0.000000 As4 0.062500 0.812500 0.062500 Ga2 0.000000 0.750000 0.250000 Sb4 0.062500 0.812500 0.312500 Ga2 0.000000 0.750000 0.500000 As4 0.062500 0.812500 0.562500 Ga4 0.000000 0.750000 0.750000 As4 0.062500 0.812500 0.812500 Ga4 0.250000 0.000000 0.000000 As4 0.312500 0.062500 0.062500 Ga3 0.250000 0.000000 0.250000 As4 0.312500 0.062500 0.312500 Ga4 0.250000 0.000000 0.500000 As4 0.312500 0.062500 0.562500 Ga4 0.250000 0.000000 0.750000 As4 0.312500 0.062500 0.812500 Ga3 0.250000 0.250000 0.000000 As4 0.312500 0.312500 0.062500 Ga4 0.250000 0.250000 0.250000 As4 0.312500 0.312500 0.312500 Ga2 0.250000 0.250000 0.500000 Sb4 0.312500 0.312500 0.562500 Ga2 0.250000 0.250000 0.750000 Sb4 0.312500 0.312500 0.812500 Ga4 0.250000 0.500000 0.000000 As4 0.312500 0.562500 0.062500 Ga4 0.250000 0.500000 0.250000 As4 0.312500 0.562500 0.312500 Ga1 0.250000 0.500000 0.500000 Sb4 0.312500 0.562500 0.562500 Ga2 0.250000 0.500000 0.750000 As4 0.312500 0.562500 0.812500 Ga4 0.250000 0.750000 0.000000 As4 0.312500 0.812500 0.062500 Ga2 0.250000 0.750000 0.250000 Sb4 0.312500 0.812500 0.312500 Ga2 0.250000 0.750000 0.500000 As4 0.312500 0.812500 0.562500 Ga4 0.250000 0.750000 0.750000 As4 0.312500 0.812500 0.812500 Ga4 0.500000 0.000000 0.000000 As4 0.562500 0.062500 0.062500 Ga3 0.500000 0.000000 0.250000 As4 0.562500 0.062500 0.312500 Ga4 0.500000 0.000000 0.500000 As4 0.562500 0.062500 0.562500 Ga4 0.500000 0.000000 0.750000 As4 0.562500 0.062500 0.812500 Ga3 0.500000 0.250000 0.000000 As4 0.562500 0.312500 0.062500 Ga4 0.500000 0.250000 0.250000 As4 0.562500 0.312500 0.312500 Ga2 0.500000 0.250000 0.500000 Sb4 0.562500 0.312500 0.562500 Ga1 0.500000 0.250000 0.750000 Sb4 0.562500 0.312500 0.812500 Ga4 0.500000 0.500000 0.000000 As4 0.562500 0.562500 0.062500 Ga3 0.500000 0.500000 0.250000 Sb4 0.562500 0.562500 0.312500 Ga0 0.500000 0.500000 0.500000 Sb4 0.562500 0.562500 0.562500 Ga2 0.500000 0.500000 0.750000 As4 0.562500 0.562500 0.812500 Ga4 0.500000 0.750000 0.000000 As4 0.562500 0.812500 0.062500 Ga1 0.500000 0.750000 0.250000 Sb4 0.562500 0.812500 0.312500 Ga2 0.500000 0.750000 0.500000 As4 0.562500 0.812500 0.562500 Ga4 0.500000 0.750000 0.750000 As4 0.562500 0.812500 0.812500 Ga4 0.750000 0.000000 0.000000 As4 0.812500 0.062500 0.062500 Ga3 0.750000 0.000000 0.250000 As4 0.812500 0.062500 0.312500 Ga4 0.750000 0.000000 0.500000 As4 0.812500 0.062500 0.562500 Ga4 0.750000 0.000000 0.750000 As4 0.812500 0.062500 0.812500 Ga3 0.750000 0.250000 0.000000 As4 0.812500 0.312500 0.062500 Ga4 0.750000 0.250000 0.250000 As4 0.812500 0.312500 0.312500 Ga2 0.750000 0.250000 0.500000 Sb4 0.812500 0.312500 0.562500 Ga1 0.750000 0.250000 0.750000 Sb4 0.812500 0.312500 0.812500 Ga4 0.750000 0.500000 0.000000 As4 0.812500 0.562500 0.062500 Ga3 0.750000 0.500000 0.250000 As4 0.812500 0.562500 0.312500 Ga1 0.750000 0.500000 0.500000 Sb4 0.812500 0.562500 0.562500 Ga2 0.750000 0.500000 0.750000 As4 0.812500 0.562500 0.812500 Ga4 0.750000 0.750000 0.000000 As4 0.812500 0.812500 0.062500 Ga2 0.750000 0.750000 0.250000 Sb4 0.812500 0.812500 0.312500 Ga2 0.750000 0.750000 0.500000 As4 0.812500 0.812500 0.562500 Ga4 0.750000 0.750000 0.750000 As4 0.812500 0.812500 0.812500 K_POINTS automatic 3 3 3 0 0 0 HUBBARD (atomic) U In0-5p -2.24 U Ga0-4p -2.04 U As0-4p 4.02 U Sb0-5p 4.43 U In1-5p -2.2575000000000003 U Ga1-4p -2.39 U As1-4p 4.5424999999999995 U Sb1-5p 4.7825 U In2-5p -2.2750000000000004 U Ga2-4p -2.74 U As2-4p 5.0649999999999995 U Sb2-5p 5.135 U In3-5p -2.2925 U Ga3-4p -3.09 U As3-4p 5.5875 U Sb3-5p 5.4875 U In4-5p -2.31 U Ga4-4p -3.44 U As4-4p 6.11 U Sb4-5p 5.84 The output file gets through several optimisation steps and then stalls at an scf iteration, as below: Number of occupied Hubbard levels = 615.3045 total cpu time spent up to now is 166039.0 secs total energy = -22879.33720322 Ry estimated scf accuracy < 0.00963404 Ry iteration # 2 ecut= 100.00 Ry beta= 0.65 Davidson diagonalization with overlap ethr = 5.38E-07, avg # of iterations = 6.0 total cpu time spent up to now is 172245.2 secs total energy = -22879.33685019 Ry estimated scf accuracy < 0.03584310 Ry iteration # 3 ecut= 100.00 Ry beta= 0.65 Davidson diagonalization with overlap I am relatively confident that this isn't a problem with the memory of the computer being overcome. I get problems on smaller cells and bands calculations. Any help is appreciated. With kind regards, I am Angus Gentles ams-OSRAM Intitute of Microelectronics, TU Wien
_______________________________________________ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples _______________________________________________ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users