Re: [OMPI users] tuning sm/vader for large messages

George Bosilca Mon, 20 Mar 2017 15:13:16 -0700

On Mon, Mar 20, 2017 at 12:45 PM, Joshua Mora <joshua_m...@usa.net> wrote:


> If at certain x msg size you achieve X performance (MB/s) and at 2x msg
> size
> or higher you achieve Y performance, being Y significantly lower than X,
> is it
> possible to have a parameter that chops messages internally to x size in
> order
> to sustain X performance rather than let it choke ?


Unfortunately not. After a certain message size you hit the hardware memory
bandwidth limit, and no pipeline can help. To push it up you will need to
have a single copy instead of 2, but vader should do this by default as
long as KNEM or CMA are available on the machine.

  George.



> sort of flow control to
> avoid congestion ?
> If that is possible, what would be that parameter for vader ?
>
> Other than source code, is there any detailed documentation/studies of
> vader
> related parameters to improve the bandwidth at large message size ? I did
> see
> some documentation for sm, but not for vader.
>
> Thanks,
> Joshua
>
>
> ------ Original Message ------
> Received: 03:06 PM CDT, 03/17/2017
> From: George Bosilca <bosi...@icl.utk.edu>
> To: Joshua Mora <joshua_m...@usa.net>
> Cc: Open MPI Users <users@lists.open-mpi.org>
> Subject: Re: [OMPI users] tuning sm/vader for large messages
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> > On Fri, Mar 17, 2017 at 3:33 PM, Joshua Mora <joshua_m...@usa.net>
> wrote:
> >
> > > Thanks for the quick reply.
> > > This test is between 2 cores that are on different CPUs. Say data has
> to
> > > traverse coherent fabric (eg. QPI,UPI, cHT).
> > > It has to go to main memory independently of cache size. Wrong
> assumption
> ?
> > >
> >
> > Depends on the usage pattern. Some benchmarks have options to clean/flush
> > the cache before each round of tests.
> >
> >
> > > Can data be evicted from cache and put into cache of second core on
> > > different
> > > CPU without placing it first in main memory ?
> > >
> >
> > It would depend on the memory coherency protocol. Usually it gets marked
> as
> > shared, and as a result it might not need to be pushed into main memory
> > right away.
> >
> >
> > > I am more thinking that there is a parameter that splits large messages
> in
> > > smaller ones at 64k or 128k ?
> > >
> >
> > Pipelining is not the answer to all situations. Once your messages are
> > larger than the caches, you already built memory pressure (by getting
> > outside the cache size) so the pipelining is bound by the memory
> bandwidth.
> >
> >
> >
> > > This seems (wrong assumption ?) like the kind of parameter I would need
> for
> > > large messages on a NIC. Coalescing data / large MTU,...
> >
> >
> > Sure, but there are hard limits imposed by the hardware, especially with
> > regards to intranode communications. Once you saturate the memory bus,
> you
> > hit a pretty hard limit.
> >
> >   George.
> >
> >
> >
> > >
> > > Joshua
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------ Original Message ------
> > > Received: 02:15 PM CDT, 03/17/2017
> > > From: George Bosilca <bosi...@icl.utk.edu>
> > > To: Open MPI Users <users@lists.open-mpi.org>
> > >
> > > Subject: Re: [OMPI users] tuning sm/vader for large messages
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > > Joshua,
> > > >
> > > > In shared memory the bandwidth depends on many parameters, including
> the
> > > > process placement and the size of the different cache levels. In your
> > > > particular case I guess after 128k you are outside the L2 cache (1/2
> of
> > > the
> > > > cache in fact) and the bandwidth will drop as the data need to be
> flushed
> > > > to main memory.
> > > >
> > > >   George.
> > > >
> > > >
> > > >
> > > > On Fri, Mar 17, 2017 at 1:47 PM, Joshua Mora <joshua_m...@usa.net>
> > > wrote:
> > > >
> > > > > Hello,
> > > > > I am trying to get the max bw for shared memory communications
> using
> > > > > osu_[bw,bibw,mbw_mr] benchmarks.
> > > > > I am observing a peak at ~64k/128K msg size and then drops instead
> of
> > > > > sustaining it.
> > > > > What parameters or linux config do I need to add to default openmpi
> > > > > settings
> > > > > to get this improved ?
> > > > > I am already using vader and knem.
> > > > >
> > > > > See below one way bandwidth with peak at 64k.
> > > > >
> > > > > # Size      Bandwidth (MB/s)
> > > > > 1                       1.02
> > > > > 2                       2.13
> > > > > 4                       4.03
> > > > > 8                       8.48
> > > > > 16                     11.90
> > > > > 32                     23.29
> > > > > 64                     47.33
> > > > > 128                    88.08
> > > > > 256                   136.77
> > > > > 512                   245.06
> > > > > 1024                  263.79
> > > > > 2048                  405.49
> > > > > 4096                 1040.46
> > > > > 8192                 1964.81
> > > > > 16384                2983.71
> > > > > 32768                5705.11
> > > > > 65536                7181.11
> > > > > 131072               6490.55
> > > > > 262144               4449.59
> > > > > 524288               4898.14
> > > > > 1048576              5324.45
> > > > > 2097152              5539.79
> > > > > 4194304              5669.76
> > > > >
> > > > > Thanks,
> > > > > Joshua
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] tuning sm/vader for large messages

Reply via email to