Re: [OMPI users] tuning sm/vader for large messages

Joshua Mora Mon, 20 Mar 2017 17:14:08 -0700

I don't want to push it up.
I just want to sustain the same bandwidth sending at that optimal size. I'd
like to see a constant bw from that size and above , not a significant drop
when I  cross a msg size.


------ Original Message ------
Received: 05:11 PM CDT, 03/20/2017
From: George Bosilca <bosi...@icl.utk.edu>
To: Joshua Mora <joshua_m...@usa.net> Cc:
Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] tuning sm/vader for large messages

> On Mon, Mar 20, 2017 at 12:45 PM, Joshua Mora  wrote:
> 
> > If at certain x msg size you achieve X performance (MB/s) and at 2x msg
> > size
> > or higher you achieve Y performance, being Y significantly lower than X,
> > is it
> > possible to have a parameter that chops messages internally to x size in
> > order
> > to sustain X performance rather than let it choke ?
> 
> 
> Unfortunately not. After a certain message size you hit the hardware memory
> bandwidth limit, and no pipeline can help. To push it up you will need to
> have a single copy instead of 2, but vader should do this by default as
> long as KNEM or CMA are available on the machine.
> 
>   George.
> 
> 
> 
> > sort of flow control to
> > avoid congestion ?
> > If that is possible, what would be that parameter for vader ?
> >
> > Other than source code, is there any detailed documentation/studies of
> > vader
> > related parameters to improve the bandwidth at large message size ? I did
> > see
> > some documentation for sm, but not for vader.
> >
> > Thanks,
> > Joshua
> >
> >
> > ------ Original Message ------
> > Received: 03:06 PM CDT, 03/17/2017
> > From: George Bosilca 
> > To: Joshua Mora 
> > Cc: Open MPI Users 
> > Subject: Re: [OMPI users] tuning sm/vader for large messages
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > On Fri, Mar 17, 2017 at 3:33 PM, Joshua Mora 
> > wrote:
> > >
> > > > Thanks for the quick reply.
> > > > This test is between 2 cores that are on different CPUs. Say data has
> > to
> > > > traverse coherent fabric (eg. QPI,UPI, cHT).
> > > > It has to go to main memory independently of cache size. Wrong
> > assumption
> > ?
> > > >
> > >
> > > Depends on the usage pattern. Some benchmarks have options to
clean/flush
> > > the cache before each round of tests.
> > >
> > >
> > > > Can data be evicted from cache and put into cache of second core on
> > > > different
> > > > CPU without placing it first in main memory ?
> > > >
> > >
> > > It would depend on the memory coherency protocol. Usually it gets
marked
> > as
> > > shared, and as a result it might not need to be pushed into main memory
> > > right away.
> > >
> > >
> > > > I am more thinking that there is a parameter that splits large
messages
> > in
> > > > smaller ones at 64k or 128k ?
> > > >
> > >
> > > Pipelining is not the answer to all situations. Once your messages are
> > > larger than the caches, you already built memory pressure (by getting
> > > outside the cache size) so the pipelining is bound by the memory
> > bandwidth.
> > >
> > >
> > >
> > > > This seems (wrong assumption ?) like the kind of parameter I would
need
> > for
> > > > large messages on a NIC. Coalescing data / large MTU,...
> > >
> > >
> > > Sure, but there are hard limits imposed by the hardware, especially
with
> > > regards to intranode communications. Once you saturate the memory bus,
> > you
> > > hit a pretty hard limit.
> > >
> > >   George.
> > >
> > >
> > >
> > > >
> > > > Joshua
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ------ Original Message ------
> > > > Received: 02:15 PM CDT, 03/17/2017
> > > > From: George Bosilca 
> > > > To: Open MPI Users 
> > > >
> > > > Subject: Re: [OMPI users] tuning sm/vader for large messages
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > Joshua,
> > > > >
> > > > > In shared memory the bandwidth depends on many parameters,
including
> > the
> > > > > process placement and the size of the different cache levels. In
your
> > > > > particular case I guess after 128k you are outside the L2 cache
(1/2
> > of
> > > > the
> > > > > cache in fact) and the bandwidth will drop as the data need to be
> > flushed
> > > > > to main memory.
> > > > >
> > > > >   George.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Mar 17, 2017 at 1:47 PM, Joshua Mora 
> > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > > I am trying to get the max bw for shared memory communications
> > using
> > > > > > osu_[bw,bibw,mbw_mr] benchmarks.
> > > > > > I am observing a peak at ~64k/128K msg size and then drops
instead
> > of
> > > > > > sustaining it.
> > > > > > What parameters or linux config do I need to add to default
openmpi
> > > > > > settings
> > > > > > to get this improved ?
> > > > > > I am already using vader and knem.
> > > > > >
> > > > > > See below one way bandwidth with peak at 64k.
> > > > > >
> > > > > > # Size      Bandwidth (MB/s)
> > > > > > 1                       1.02
> > > > > > 2                       2.13
> > > > > > 4                       4.03
> > > > > > 8                       8.48
> > > > > > 16                     11.90
> > > > > > 32                     23.29
> > > > > > 64                     47.33
> > > > > > 128                    88.08
> > > > > > 256                   136.77
> > > > > > 512                   245.06
> > > > > > 1024                  263.79
> > > > > > 2048                  405.49
> > > > > > 4096                 1040.46
> > > > > > 8192                 1964.81
> > > > > > 16384                2983.71
> > > > > > 32768                5705.11
> > > > > > 65536                7181.11
> > > > > > 131072               6490.55
> > > > > > 262144               4449.59
> > > > > > 524288               4898.14
> > > > > > 1048576              5324.45
> > > > > > 2097152              5539.79
> > > > > > 4194304              5669.76
> > > > > >
> > > > > > Thanks,
> > > > > > Joshua
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > users@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> 


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] tuning sm/vader for large messages

Reply via email to