I don't suppose you could upgrade to 4.0.2, could you?

We just released 4.0.2 with a ton of good bug fixes.


On Oct 15, 2019, at 2:07 PM, Eric F. Alemany via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:

Hi,

I am using OpenMPI-4.0.1 on a single ubuntu 18.04 server with 64 cores.
I compiled an “hello.c” file with:    mpicc hello.c -o openmpi_hello

When I run  mpirun -np 64  openmpi_hello
I got the following error.


[radoncjonsnow:08747] *** Process received signal ***
[radoncjonsnow:08747] Signal: Segmentation fault (11)
[radoncjonsnow:08747] Signal code: Address not mapped (1)
[radoncjonsnow:08747] Failing at address: 0x7f7b001b0008
[radoncjonsnow:08747] [ 0] 
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f7aff89ef20]
[radoncjonsnow:08747] [ 1] 
/usr/local/.openmpi/lib/pmix/mca_gds_ds21.so(pmix_gds_ds21_lock_init+0x12a)[0x7f7af79d214a]
[radoncjonsnow:08747] [ 2] 
/usr/local/.openmpi/lib/libmca_common_dstore.so.1(pmix_common_dstor_init+0x86b)[0x7f7af73b1b9b]
[radoncjonsnow:08747] [ 3] 
/usr/local/.openmpi/lib/pmix/mca_gds_ds21.so(+0x1dd4)[0x7f7af79d1dd4]
[radoncjonsnow:08747] [ 4] 
/usr/local/.openmpi/lib/openmpi/mca_pmix_pmix3x.so(OPAL_MCA_PMIX3X_pmix_gds_base_select+0x118)[0x7f7afc9f5ee8]
[radoncjonsnow:08747] [ 5] 
/usr/local/.openmpi/lib/openmpi/mca_pmix_pmix3x.so(OPAL_MCA_PMIX3X_pmix_rte_init+0x813)[0x7f7afc9b0763]
[radoncjonsnow:08747] [ 6] 
/usr/local/.openmpi/lib/openmpi/mca_pmix_pmix3x.so(OPAL_MCA_PMIX3X_PMIx_Init+0x198)[0x7f7afc96e868]
[radoncjonsnow:08747] [ 7] 
/usr/local/.openmpi/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_client_init+0xcb)[0x7f7afc91870b]
[radoncjonsnow:08747] [ 8] 
/usr/local/.openmpi/lib/openmpi/mca_ess_pmi.so(+0x1a56)[0x7f7afd441a56]
[radoncjonsnow:08747] [ 9] 
/usr/local/.openmpi/lib/libopen-rte.so.40(orte_init+0x291)[0x7f7aff5ba411]
[radoncjonsnow:08747] [10] 
/usr/local/.openmpi/lib/libmpi.so.40(ompi_mpi_init+0x29c)[0x7f7affca79cc]
[radoncjonsnow:08747] [11] 
/usr/local/.openmpi/lib/libmpi.so.40(MPI_Init+0x6e)[0x7f7affcd81ae]
[radoncjonsnow:08747] [12] openmpi_hello(+0x88b)[0x55d6a7c1c88b]
[radoncjonsnow:08747] [13] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f7aff881b97]
[radoncjonsnow:08747] [14] openmpi_hello(+0x77a)[0x55d6a7c1c77a]
[radoncjonsnow:08747] *** End of error message ***

That same error is repeated about 32 times. At the end it says

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 5 with PID 0 on node radoncjonsnow exited on 
signal 11 (Segmentation fault).
--------------------------------------------------------------------------


Do you know what I am doing wrong?

Thank you all for your help.

Best,
Eric


____________________________________________________________________________________________________________________________



Eric F.  Alemany
Systems Administrator for Research
EXO Extended Operations

Stanford Medicine - Technology & Digital Services
Stanford, California 94305










--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>

Reply via email to