Hi,

I have the following issue with version 4.0.1 when running on a node with
two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or
less is fine, and running 33 or above gives the "not enough slots" message
which is expected.

However, using 31 or 32 ranks results in the following error:

[nodek19:391429] *** Process received signal ***
[nodek19:391429] Signal: Segmentation fault (11)
[nodek19:391429] Signal code: Address not mapped (1)
[nodek19:391429] Failing at address: 0x7fa34954d008
[nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0]
[nodek19:391429] [ 1]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a]
[nodek19:391429] [ 2]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3]
[nodek19:391429] [ 3]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14]
[nodek19:391429] [ 4]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8]
[nodek19:391429] [ 5]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83]
[nodek19:391429] [ 6]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08]
[nodek19:391429] [ 7]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb]
[nodek19:391429] [ 8]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6]
[nodek19:391429] [ 9]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21]
[nodek19:391429] [10]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24]
[nodek19:391429] [11]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89]
[nodek19:391429] [12] mpitest[0x4007fe]
[nodek19:391429] [13]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5]
[nodek19:391429] [14] mpitest[0x400729]
[nodek19:391429] *** End of error message ***


Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores)
or 6126 (12 cores), the issue doesn't occur. I'm able to use all the cores,
28 and 24 respectivity. Version 3.1.4 works across all three computers
without
issue.

Any comments would be appreciated.

Regards,

T.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to