On Mon, Mar 20, 2017 at 12:45 PM, Joshua Mora <joshua_m...@usa.net> wrote:
> If at certain x msg size you achieve X performance (MB/s) and at 2x msg > size > or higher you achieve Y performance, being Y significantly lower than X, > is it > possible to have a parameter that chops messages internally to x size in > order > to sustain X performance rather than let it choke ? Unfortunately not. After a certain message size you hit the hardware memory bandwidth limit, and no pipeline can help. To push it up you will need to have a single copy instead of 2, but vader should do this by default as long as KNEM or CMA are available on the machine. George. > sort of flow control to > avoid congestion ? > If that is possible, what would be that parameter for vader ? > > Other than source code, is there any detailed documentation/studies of > vader > related parameters to improve the bandwidth at large message size ? I did > see > some documentation for sm, but not for vader. > > Thanks, > Joshua > > > ------ Original Message ------ > Received: 03:06 PM CDT, 03/17/2017 > From: George Bosilca <bosi...@icl.utk.edu> > To: Joshua Mora <joshua_m...@usa.net> > Cc: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] tuning sm/vader for large messages > > > > > > > > > > > > > > > > On Fri, Mar 17, 2017 at 3:33 PM, Joshua Mora <joshua_m...@usa.net> > wrote: > > > > > Thanks for the quick reply. > > > This test is between 2 cores that are on different CPUs. Say data has > to > > > traverse coherent fabric (eg. QPI,UPI, cHT). > > > It has to go to main memory independently of cache size. Wrong > assumption > ? > > > > > > > Depends on the usage pattern. Some benchmarks have options to clean/flush > > the cache before each round of tests. > > > > > > > Can data be evicted from cache and put into cache of second core on > > > different > > > CPU without placing it first in main memory ? > > > > > > > It would depend on the memory coherency protocol. Usually it gets marked > as > > shared, and as a result it might not need to be pushed into main memory > > right away. > > > > > > > I am more thinking that there is a parameter that splits large messages > in > > > smaller ones at 64k or 128k ? > > > > > > > Pipelining is not the answer to all situations. Once your messages are > > larger than the caches, you already built memory pressure (by getting > > outside the cache size) so the pipelining is bound by the memory > bandwidth. > > > > > > > > > This seems (wrong assumption ?) like the kind of parameter I would need > for > > > large messages on a NIC. Coalescing data / large MTU,... > > > > > > Sure, but there are hard limits imposed by the hardware, especially with > > regards to intranode communications. Once you saturate the memory bus, > you > > hit a pretty hard limit. > > > > George. > > > > > > > > > > > > Joshua > > > > > > > > > > > > > > > > > > > > > > > > > > > ------ Original Message ------ > > > Received: 02:15 PM CDT, 03/17/2017 > > > From: George Bosilca <bosi...@icl.utk.edu> > > > To: Open MPI Users <users@lists.open-mpi.org> > > > > > > Subject: Re: [OMPI users] tuning sm/vader for large messages > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Joshua, > > > > > > > > In shared memory the bandwidth depends on many parameters, including > the > > > > process placement and the size of the different cache levels. In your > > > > particular case I guess after 128k you are outside the L2 cache (1/2 > of > > > the > > > > cache in fact) and the bandwidth will drop as the data need to be > flushed > > > > to main memory. > > > > > > > > George. > > > > > > > > > > > > > > > > On Fri, Mar 17, 2017 at 1:47 PM, Joshua Mora <joshua_m...@usa.net> > > > wrote: > > > > > > > > > Hello, > > > > > I am trying to get the max bw for shared memory communications > using > > > > > osu_[bw,bibw,mbw_mr] benchmarks. > > > > > I am observing a peak at ~64k/128K msg size and then drops instead > of > > > > > sustaining it. > > > > > What parameters or linux config do I need to add to default openmpi > > > > > settings > > > > > to get this improved ? > > > > > I am already using vader and knem. > > > > > > > > > > See below one way bandwidth with peak at 64k. > > > > > > > > > > # Size Bandwidth (MB/s) > > > > > 1 1.02 > > > > > 2 2.13 > > > > > 4 4.03 > > > > > 8 8.48 > > > > > 16 11.90 > > > > > 32 23.29 > > > > > 64 47.33 > > > > > 128 88.08 > > > > > 256 136.77 > > > > > 512 245.06 > > > > > 1024 263.79 > > > > > 2048 405.49 > > > > > 4096 1040.46 > > > > > 8192 1964.81 > > > > > 16384 2983.71 > > > > > 32768 5705.11 > > > > > 65536 7181.11 > > > > > 131072 6490.55 > > > > > 262144 4449.59 > > > > > 524288 4898.14 > > > > > 1048576 5324.45 > > > > > 2097152 5539.79 > > > > > 4194304 5669.76 > > > > > > > > > > Thanks, > > > > > Joshua > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > users mailing list > > > > > users@lists.open-mpi.org > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > users mailing list > > > > users@lists.open-mpi.org > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users