It might be worth trying the latest v4.0.x nightly snapshot -- we just updated the internal PMIx on the v4.0.x branch:
https://www.open-mpi.org/nightly/v4.0.x/ > On Jul 10, 2019, at 1:29 PM, Steven Varga via users > <users@lists.open-mpi.org> wrote: > > Hi i am fighting similar. Did you try to update the pmix most recent 3.1.3 > series release? > > On Wed, Jul 10, 2019, 12:24 Raymond Arter via users, > <users@lists.open-mpi.org> wrote: > Hi, > > I have the following issue with version 4.0.1 when running on a node with > two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or > less is fine, and running 33 or above gives the "not enough slots" message > which is expected. > > However, using 31 or 32 ranks results in the following error: > > [nodek19:391429] *** Process received signal *** > [nodek19:391429] Signal: Segmentation fault (11) > [nodek19:391429] Signal code: Address not mapped (1) > [nodek19:391429] Failing at address: 0x7fa34954d008 > [nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0] > [nodek19:391429] [ 1] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a] > [nodek19:391429] [ 2] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3] > [nodek19:391429] [ 3] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14] > [nodek19:391429] [ 4] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8] > [nodek19:391429] [ 5] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83] > [nodek19:391429] [ 6] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08] > [nodek19:391429] [ 7] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb] > [nodek19:391429] [ 8] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6] > [nodek19:391429] [ 9] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21] > [nodek19:391429] [10] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24] > [nodek19:391429] [11] > /opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89] > [nodek19:391429] [12] mpitest[0x4007fe] > [nodek19:391429] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5] > [nodek19:391429] [14] mpitest[0x400729] > [nodek19:391429] *** End of error message *** > > > Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores) > or 6126 (12 cores), the issue doesn't occur. I'm able to use all the cores, > 28 and 24 respectivity. Version 3.1.4 works across all three computers > without > issue. > > Any comments would be appreciated. > > Regards, > > T. > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users