Hi, I have the following issue with version 4.0.1 when running on a node with two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or less is fine, and running 33 or above gives the "not enough slots" message which is expected.
However, using 31 or 32 ranks results in the following error: [nodek19:391429] *** Process received signal *** [nodek19:391429] Signal: Segmentation fault (11) [nodek19:391429] Signal code: Address not mapped (1) [nodek19:391429] Failing at address: 0x7fa34954d008 [nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0] [nodek19:391429] [ 1] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a] [nodek19:391429] [ 2] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3] [nodek19:391429] [ 3] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14] [nodek19:391429] [ 4] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8] [nodek19:391429] [ 5] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83] [nodek19:391429] [ 6] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08] [nodek19:391429] [ 7] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb] [nodek19:391429] [ 8] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6] [nodek19:391429] [ 9] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21] [nodek19:391429] [10] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24] [nodek19:391429] [11] /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89] [nodek19:391429] [12] mpitest[0x4007fe] [nodek19:391429] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5] [nodek19:391429] [14] mpitest[0x400729] [nodek19:391429] *** End of error message *** Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores) or 6126 (12 cores), the issue doesn't occur. I'm able to use all the cores, 28 and 24 respectivity. Version 3.1.4 works across all three computers without issue. Any comments would be appreciated. Regards, T.
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users