Thanks. Fair enough. I will mark 2.0.2 as faulty for myself, and try the
latest version when I have time for this.

On Wed, Jun 6, 2018 at 2:40 PM, Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:

> Alexander --
>
> I don't know offhand if 2.0.2 was faulty in this area.  We usually ask
> users to upgrade to at least the latest release in a given series (e.g.,
> 2.0.4) because various bug fixes are included in each sub-release.  It
> wouldn't be much use to go through all the effort to make a proper bug
> report for v2.0.2 if it the issue was already fixed by v2.0.4.
>
>
> > On Jun 6, 2018, at 7:40 AM, Alexander Supalov <
> alexander.supa...@gmail.com> wrote:
> >
> > Thanks. This was not my question. I want to know if 2.0.2 was indeed
> faulty in this area.
> >
> > On Wed, Jun 6, 2018 at 1:22 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
> > Alexander,
> >
> > Note the v2.0 series is no more supported, and you should upgrade to
> v3.1, v3.0 or v2.1
> >
> > You might have to force the tcp buffers size to 0 for optimal
> performances
> > iirc, mpirun —mca btl_tcp_sndbuf_size 0 —mca btl_tcp_rcvbuf_size 0 ...
> > (I am afk, so please confirm both parameter names and default values
> with ompi_info —all)
> > If you upgrade to the latest version, default parameter is already
> optimal.
> >
> > Last but not least, the btl/tcp component uses all the available
> interfaces by default, so you might want to first restrict to a single
> interface
> > mpirun —mca btl_tcp_if_include 192.168.0.0/24 ...
> >
> > Hope this helps !
> >
> > Gilles
> >
> > On Wednesday, June 6, 2018, Alexander Supalov <
> alexander.supa...@gmail.com> wrote:
> > Hi everybody,
> >
> > I noticed that sockets do not seem to work properly in the Open MPI
> version mentioned above. Intranode runs are OK. Internode, over 100-MBit
> Ethernet, I can go only as high as 32 KiB in a simple MPI ping-pong kind of
> benchmark. Before I start composing a full bug report: is this another
> known issue?
> >
> > Here are the diagnostics under Ubuntu 16.04 LTS:
> >
> > papa@pete:~/Documents/Projects/Books/Inside/MPI/Source/openmpi-2.0.2-plain$
> mpirun --prefix 
> /home/papa/Documents/Projects/Books/Inside/MPI/Source/openmpi-2.0.2-installed
> -map-by node -hostfile mpi.hosts -n 2 $PWD/pingpong1
> > r = 0    bytes = 0           iters = 1           time = 0.0156577
>  lat = 0.00782887      bw = 0
> > r = 1    bytes = 0           iters = 1           time = 0.011045
> lat = 0.00552249      bw = 0
> > r = 0    bytes = 1           iters = 1           time = 0.000459942
>  lat = 0.000229971     bw = 4348.37
> > r = 1    bytes = 1           iters = 1           time = 0.000268888
>  lat = 0.000134444     bw = 7438.04
> > r = 0    bytes = 2           iters = 1           time = 0.000386158
>  lat = 0.000193079     bw = 10358.5
> > r = 1    bytes = 2           iters = 1           time = 0.000253175
>  lat = 0.000126587     bw = 15799.3
> > r = 0    bytes = 4           iters = 1           time = 0.000388046
>  lat = 0.000194023     bw = 20616.1
> > r = 1    bytes = 4           iters = 1           time = 0.000235434
>  lat = 0.000117717     bw = 33979.8
> > r = 0    bytes = 8           iters = 1           time = 0.000354141
>  lat = 0.00017707      bw = 45179.8
> > r = 1    bytes = 8           iters = 1           time = 0.000240324
>  lat = 0.000120162     bw = 66576.8
> > r = 0    bytes = 16          iters = 1           time = 0.000350701
>  lat = 0.000175351     bw = 91245.8
> > r = 1    bytes = 16          iters = 1           time = 0.000184242
>  lat = 9.2121e-05      bw = 173685
> > r = 0    bytes = 32          iters = 1           time = 0.000351037
>  lat = 0.000175518     bw = 182317
> > r = 1    bytes = 32          iters = 1           time = 0.00025953
> lat = 0.000129765     bw = 246600
> > r = 0    bytes = 64          iters = 1           time = 0.000425288
>  lat = 0.000212644     bw = 300973
> > r = 1    bytes = 64          iters = 1           time = 0.000241162
>  lat = 0.000120581     bw = 530764
> > r = 0    bytes = 128         iters = 1           time = 0.000401526
>  lat = 0.000200763     bw = 637568
> > r = 1    bytes = 128         iters = 1           time = 0.000279226
>  lat = 0.000139613     bw = 916820
> > r = 0    bytes = 256         iters = 1           time = 0.000436665
>  lat = 0.000218332     bw = 1.17252e+06
> > r = 1    bytes = 256         iters = 1           time = 0.000269657
>  lat = 0.000134829     bw = 1.89871e+06
> > r = 0    bytes = 512         iters = 1           time = 0.000496634
>  lat = 0.000248317     bw = 2.06188e+06
> > r = 1    bytes = 512         iters = 1           time = 0.000291029
>  lat = 0.000145514     bw = 3.51855e+06
> > r = 1    bytes = 1024        iters = 1           time = 0.000405219 r =
> 0    bytes = 1024        iters = 1           time = 0.000672843     lat =
> 0.000336421     bw = 3.0438e+06
> >     lat = 0.000202609     bw = 5.05406e+06
> > r = 0    bytes = 2048        iters = 1           time = 0.000874569
>  lat = 0.000437284     bw = 4.68345e+06
> > r = 1    bytes = 2048        iters = 1           time = 0.000489308
>  lat = 0.000244654     bw = 8.37101e+06
> > r = 1    bytes = 4096        iters = 1           time = 0.000853111
>  lat = 0.000426556 r = 0    bytes = 4096        iters = 1           time =
> 0.00142215      lat = 0.000711077     bw = 5.76027e+06
> >     bw = 9.6025e+06
> > r = 0    bytes = 8192        iters = 1           time = 0.00239346
> lat = 0.00119673      bw = 6.84531e+06
> > r = 1    bytes = 8192        iters = 1           time = 0.00132503
> lat = 0.000662515     bw = 1.2365e+07
> > r = 0    bytes = 16384       iters = 1           time = 0.004443
> lat = 0.0022215       bw = 7.37519e+06
> > r = 1    bytes = 16384       iters = 1           time = 0.00255605
> lat = 0.00127803      bw = 1.28198e+07
> > r = 1    bytes = 32768       iters = 1       r = 0    bytes = 32768
>  iters = 1           time = 0.00812741      lat = 0.0040637       bw =
> 8.06358e+06
> >     time = 0.0046272       lat = 0.0023136       bw = 1.41632e+07
> > [pete:07038] *** Process received signal ***
> > [pete:07038] Signal: Segmentation fault (11)
> > [pete:07038] Signal code:  (128)
> > [pete:07038] Failing at address: (nil)
> > [pete:07038] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[
> 0x7f716325a390]
> > [pete:07038] [ 1] /home/papa/Documents/Projects/Books/Inside/MPI/Source/
> openmpi-2.0.2-installed/lib/openmpi/mca_pml_ob1.so(mca_
> pml_ob1_send_request_put+0x1b)[0x7f715a32192b]
> > [pete:07038] [ 2] /home/papa/Documents/Projects/Books/Inside/MPI/Source/
> openmpi-2.0.2-installed/lib/openmpi/mca_btl_tcp.so(+
> 0x7eae)[0x7f715a52deae]
> > [pete:07038] [ 3] /home/papa/Documents/Projects/Books/Inside/MPI/Source/
> openmpi-2.0.2-installed/lib/libopen-pal.so.20(opal_
> libevent2022_event_base_loop+0x7f3)[0x7f7162975ef3]
> > [pete:07038] [ 4] /home/papa/Documents/Projects/Books/Inside/MPI/Source/
> openmpi-2.0.2-installed/lib/libopen-pal.so.20(opal_progress+0x101)[
> 0x7f71629361b1]
> > [pete:07038] [ 5] /home/papa/Documents/Projects/Books/Inside/MPI/Source/
> openmpi-2.0.2-installed/lib/openmpi/mca_pml_ob1.so(mca_
> pml_ob1_send+0x2b5)[0x7f715a312a95]
> > [pete:07038] [ 6] /home/papa/Documents/Projects/Books/Inside/MPI/Source/
> openmpi-2.0.2-installed/lib/libmpi.so.20(PMPI_Send+0x14b)[0x7f71634d5ffb]
> > [pete:07038] [ 7] /home/papa/Documents/Projects/Books/Inside/MPI/Source/
> openmpi-2.0.2-plain/pingpong1[0x400b25]
> > [pete:07038] [ 8] /lib/x86_64-linux-gnu/libc.so.
> 6(__libc_start_main+0xf0)[0x7f7162e9f830]
> > [pete:07038] [ 9] /home/papa/Documents/Projects/Books/Inside/MPI/Source/
> openmpi-2.0.2-plain/pingpong1[0x400999]
> > [pete:07038] *** End of error message ***
> > ------------------------------------------------------------
> --------------
> > mpirun noticed that process rank 0 with PID 0 on node pete exited on
> signal 11 (Segmentation fault).
> > ------------------------------------------------------------
> --------------
> >
> > Host list:
> >
> > 192.168.178.31
> > 192.168.178.32
> >
> > Platform:
> >
> > Intel, Ubuntu 16.04 LTS on one side, Ubuntu 14.04 LTS on the other, Open
> MPI 2.0. or 2.1.0 on both, 100-Mbit Ethernet in between.
> >
> > Note that I have to map by node in order to get internode connectivity
> tested, otherwise I get an intranode run. A bit unexpected, given the host
> file.
> >
> > Best regards.
> >
> > Alexander
> >
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to