Jeff and Steven,
Thanks for your help.
I downloaded the nightly snapshot and it fixes the problem. I need to
do more testing tomorrow and I will report back if any issues arise.
Thanks again.
T.
On 10/07/2019 18:44, Jeff Squyres (jsquyres) via users wrote:
It might be worth trying the latest v4.0.x nightly snapshot -- we just updated
the internal PMIx on the v4.0.x branch:
https://www.open-mpi.org/nightly/v4.0.x/
On Jul 10, 2019, at 1:29 PM, Steven Varga via users <users@lists.open-mpi.org>
wrote:
Hi i am fighting similar. Did you try to update the pmix most recent 3.1.3
series release?
On Wed, Jul 10, 2019, 12:24 Raymond Arter via users, <users@lists.open-mpi.org>
wrote:
Hi,
I have the following issue with version 4.0.1 when running on a node with
two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or
less is fine, and running 33 or above gives the "not enough slots" message
which is expected.
However, using 31 or 32 ranks results in the following error:
[nodek19:391429] *** Process received signal ***
[nodek19:391429] Signal: Segmentation fault (11)
[nodek19:391429] Signal code: Address not mapped (1)
[nodek19:391429] Failing at address: 0x7fa34954d008
[nodek19:391429] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa348dfc5d0]
[nodek19:391429] [ 1]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x11a)[0x7fa345ded16a]
[nodek19:391429] [ 2]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x833)[0x7fa3493c8df3]
[nodek19:391429] [ 3]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/pmix/mca_gds_ds21.so(+0x1e14)[0x7fa345dece14]
[nodek19:391429] [ 4]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x108)[0x7fa345b73fe8]
[nodek19:391429] [ 5]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_pmix_rte_init+0x7c3)[0x7fa345b30f83]
[nodek19:391429] [ 6]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libpmix.so.2(OPAL_MCA_PMIX3X_PMIx_Init+0x168)[0x7fa345aefd08]
[nodek19:391429] [ 7]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xbb)[0x7fa345bc4fdb]
[nodek19:391429] [ 8]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/openmpi/mca_ess_pmi.so(+0x1ad6)[0x7fa3467f2ad6]
[nodek19:391429] [ 9]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libopen-rte.so.40(orte_init+0x291)[0x7fa348780b21]
[nodek19:391429] [10]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(ompi_mpi_init+0x264)[0x7fa349058a24]
[nodek19:391429] [11]
/opt/apps/libs/openmpi/4.0.1/gcc/testing/lib/libmpi.so.40(MPI_Init+0x99)[0x7fa349088b89]
[nodek19:391429] [12] mpitest[0x4007fe]
[nodek19:391429] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa348a423d5]
[nodek19:391429] [14] mpitest[0x400729]
[nodek19:391429] *** End of error message ***
Furthermore, when using computers with two Intel Xeon Gold 6132 (14 cores)
or 6126 (12 cores), the issue doesn't occur. I'm able to use all the cores,
28 and 24 respectivity. Version 3.1.4 works across all three computers without
issue.
Any comments would be appreciated.
Regards,
T.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users