I do here is the output:

2 total processes killed (some possibly by mpirun during cleanup)
[pandora:12238] *** Process received signal ***
[pandora:12238] Signal: Segmentation fault (11)
[pandora:12238] Signal code: Invalid permissions (2)
[pandora:12238] Failing at address: 0x7f5c8e31fff0
[pandora:12238] [ 0] /usr/lib64/libpthread.so.0(+0xf680)[0x7f5ca205f680]
[pandora:12238] [ 1] [pandora:12237] *** Process received signal ***
/usr/lib64/libc.so.6(+0x14c4a0)[0x7f5ca1dcc4a0]
[pandora:12238] [ 2] [pandora:12237] Signal: Segmentation fault (11)
[pandora:12237] Signal code: Invalid permissions (2)
[pandora:12237] Failing at address: 0x7f6c4ab3fff0
/opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f5ca16fbe55]
[pandora:12238] [ 3]
/opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f5ca231798b]
[pandora:12238] [ 4]
/opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f5ca22eeda7]
[pandora:12238] [ 5] IMB-MPI1[0x40b83b]
[pandora:12238] [ 6] IMB-MPI1[0x407155]
[pandora:12238] [ 7] IMB-MPI1[0x4022ea]
[pandora:12238] [ 8]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f5ca1ca23d5]
[pandora:12238] [ 9] IMB-MPI1[0x401d49]
[pandora:12238] *** End of error message ***
[pandora:12237] [ 0] /usr/lib64/libpthread.so.0(+0xf680)[0x7f6c5e73f680]
[pandora:12237] [ 1] /usr/lib64/libc.so.6(+0x14c4a0)[0x7f6c5e4ac4a0]
[pandora:12237] [ 2]
/opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f6c5dddbe55]
[pandora:12237] [ 3]
/opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f6c5e9f798b]
[pandora:12237] [ 4]
/opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f6c5e9ceda7]
[pandora:12237] [ 5] IMB-MPI1[0x40b83b]
[pandora:12237] [ 6] IMB-MPI1[0x407155]
[pandora:12237] [ 7] IMB-MPI1[0x4022ea]
[pandora:12237] [ 8]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6c5e3823d5]
[pandora:12237] [ 9] IMB-MPI1[0x401d49]
[pandora:12237] *** End of error message ***
[phoebe:07408] *** Process received signal ***
[phoebe:07408] Signal: Segmentation fault (11)
[phoebe:07408] Signal code: Invalid permissions (2)
[phoebe:07408] Failing at address: 0x7f6b9ca9fff0
[titan:07169] *** Process received signal ***
[titan:07169] Signal: Segmentation fault (11)
[titan:07169] Signal code: Invalid permissions (2)
[titan:07169] Failing at address: 0x7fc01295fff0
[phoebe:07408] [ 0] /usr/lib64/libpthread.so.0(+0xf680)[0x7f6bb03b7680]
[phoebe:07408] [ 1] /usr/lib64/libc.so.6(+0x14c4a0)[0x7f6bb01244a0]
[phoebe:07408] [ 2] [titan:07169] [ 0]
/usr/lib64/libpthread.so.0(+0xf680)[0x7fc026117680]
[titan:07169] [ 1]
/opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f6bafa53e55]
[phoebe:07408] [ 3] /usr/lib64/libc.so.6(+0x14c4a0)[0x7fc025e844a0]
[titan:07169] [ 2]
/opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f6bb066f98b]
[phoebe:07408] [ 4]
/opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7fc0257b3e55]
[titan:07169] [ 3]
/opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f6bb0646da7]
[phoebe:07408] [ 5] IMB-MPI1[0x40b83b]
[phoebe:07408] [ 6] IMB-MPI1[0x407155]
/opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7fc0263cf98b]
[titan:07169] [ 4] [phoebe:07408] [ 7] IMB-MPI1[0x4022ea]
[phoebe:07408] [ 8]
/opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7fc0263a6da7]
[titan:07169] [ 5] IMB-MPI1[0x40b83b]
[titan:07169] [ 6] IMB-MPI1[0x407155]
[titan:07169] [ 7]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6bafffa3d5]
[phoebe:07408] [ 9] IMB-MPI1[0x401d49]
[phoebe:07408] *** End of error message ***
IMB-MPI1[0x4022ea]
[titan:07169] [ 8]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc025d5a3d5]
[titan:07169] [ 9] IMB-MPI1[0x401d49]
[titan:07169] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node pandora exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------


- Adam LeBlanc

On Wed, Feb 20, 2019 at 1:20 PM Howard Pritchard <hpprit...@gmail.com>
wrote:

> HI Adam,
>
> As a sanity check, if you try to use --mca btl self,vader,tcp
>
> do you still see the segmentation fault?
>
> Howard
>
>
> Am Mi., 20. Feb. 2019 um 08:50 Uhr schrieb Adam LeBlanc <
> alebl...@iol.unh.edu>:
>
>> Hello,
>>
>> When I do a run with OpenMPI v4.0.0 on Infiniband with this command:
>> mpirun --mca btl_openib_warn_no_device_params_found 0 --map-by node --mca
>> orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca
>> btl_openib_allow_ib 1 -np 6
>>  -hostfile /home/aleblanc/ib-mpi-hosts IMB-MPI1
>>
>> I get this error:
>>
>> #----------------------------------------------------------------
>> # Benchmarking Reduce_scatter
>> # #processes = 4
>> # ( 2 additional processes waiting in MPI_Barrier)
>> #----------------------------------------------------------------
>>        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>>             0         1000         0.14         0.15         0.14
>>             4         1000         5.00         7.58         6.28
>>             8         1000         5.13         7.68         6.41
>>            16         1000         5.05         7.74         6.39
>>            32         1000         5.43         7.96         6.75
>>            64         1000         6.78         8.56         7.69
>>           128         1000         7.77         9.55         8.59
>>           256         1000         8.28        10.96         9.66
>>           512         1000         9.19        12.49        10.85
>>          1024         1000        11.78        15.01        13.38
>>          2048         1000        17.41        19.51        18.52
>>          4096         1000        25.73        28.22        26.89
>>          8192         1000        47.75        49.44        48.79
>>         16384         1000        81.10        90.15        84.75
>>         32768         1000       163.01       178.58       173.19
>>         65536          640       315.63       340.51       333.18
>>        131072          320       475.48       528.82       510.85
>>        262144          160       979.70      1063.81      1035.61
>>        524288           80      2070.51      2242.58      2150.15
>>       1048576           40      4177.36      4527.25      4431.65
>>       2097152           20      8738.08      9340.50      9147.89
>> [pandora:04500] *** Process received signal ***
>> [pandora:04500] Signal: Segmentation fault (11)
>> [pandora:04500] Signal code: Address not mapped (1)
>> [pandora:04500] Failing at address: 0x7f310ebffff0
>> [pandora:04499] *** Process received signal ***
>> [pandora:04499] Signal: Segmentation fault (11)
>> [pandora:04499] Signal code: Address not mapped (1)
>> [pandora:04499] Failing at address: 0x7f28b11ffff0
>> [pandora:04500] [ 0] /usr/lib64/libpthread.so.0(+0xf680)[0x7f3126bef680]
>> [pandora:04500] [ 1] /usr/lib64/libc.so.6(+0x14c4a0)[0x7f312695c4a0]
>> [pandora:04500] [ 2]
>> /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f312628be55]
>> [pandora:04500] [ 3] [pandora:04499] [ 0]
>> /opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f3126ea798b]
>> [pandora:04500] [ 4] /usr/lib64/libpthread.so.0(+0xf680)[0x7f28c91ef680]
>> [pandora:04499] [ 1]
>> /opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f3126e7eda7]
>> [pandora:04500] [ 5] IMB-MPI1[0x40b83b]
>> [pandora:04500] [ 6] IMB-MPI1[0x407155]
>> [pandora:04500] [ 7] IMB-MPI1[0x4022ea]
>> [pandora:04500] [ 8] /usr/lib64/libc.so.6(+0x14c4a0)[0x7f28c8f5c4a0]
>> [pandora:04499] [ 2]
>> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f31268323d5]
>> [pandora:04500] [ 9] IMB-MPI1[0x401d49]
>> [pandora:04500] *** End of error message ***
>> /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f28c888be55]
>> [pandora:04499] [ 3]
>> /opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f28c94a798b]
>> [pandora:04499] [ 4]
>> /opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f28c947eda7]
>> [pandora:04499] [ 5] IMB-MPI1[0x40b83b]
>> [pandora:04499] [ 6] IMB-MPI1[0x407155]
>> [pandora:04499] [ 7] IMB-MPI1[0x4022ea]
>> [pandora:04499] [ 8]
>> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f28c8e323d5]
>> [pandora:04499] [ 9] IMB-MPI1[0x401d49]
>> [pandora:04499] *** End of error message ***
>> [phoebe:03779] *** Process received signal ***
>> [phoebe:03779] Signal: Segmentation fault (11)
>> [phoebe:03779] Signal code: Address not mapped (1)
>> [phoebe:03779] Failing at address: 0x7f483d6ffff0
>> [phoebe:03779] [ 0] /usr/lib64/libpthread.so.0(+0xf680)[0x7f48556c7680]
>> [phoebe:03779] [ 1] /usr/lib64/libc.so.6(+0x14c4a0)[0x7f48554344a0]
>> [phoebe:03779] [ 2]
>> /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f4854d63e55]
>> [phoebe:03779] [ 3]
>> /opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f485597f98b]
>> [phoebe:03779] [ 4]
>> /opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f4855956da7]
>> [phoebe:03779] [ 5] IMB-MPI1[0x40b83b]
>> [phoebe:03779] [ 6] IMB-MPI1[0x407155]
>> [phoebe:03779] [ 7] IMB-MPI1[0x4022ea]
>> [phoebe:03779] [ 8]
>> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f485530a3d5]
>> [phoebe:03779] [ 9] IMB-MPI1[0x401d49]
>> [phoebe:03779] *** End of error message ***
>> --------------------------------------------------------------------------
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code. Per user-direction, the job has been aborted.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 1 with PID 3779 on node phoebe-ib exited
>> on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>>
>> Also if I reinstall 3.1.2 I do not have this issue at all.
>>
>> Any thoughts on what could be the issue?
>>
>> Thanks,
>> Adam LeBlanc
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to