I don't want to push it up. I just want to sustain the same bandwidth sending at that optimal size. I'd like to see a constant bw from that size and above , not a significant drop when I cross a msg size.
------ Original Message ------ Received: 05:11 PM CDT, 03/20/2017 From: George Bosilca <bosi...@icl.utk.edu> To: Joshua Mora <joshua_m...@usa.net> Cc: Open MPI Users <users@lists.open-mpi.org> Subject: Re: [OMPI users] tuning sm/vader for large messages > On Mon, Mar 20, 2017 at 12:45 PM, Joshua Mora wrote: > > > If at certain x msg size you achieve X performance (MB/s) and at 2x msg > > size > > or higher you achieve Y performance, being Y significantly lower than X, > > is it > > possible to have a parameter that chops messages internally to x size in > > order > > to sustain X performance rather than let it choke ? > > > Unfortunately not. After a certain message size you hit the hardware memory > bandwidth limit, and no pipeline can help. To push it up you will need to > have a single copy instead of 2, but vader should do this by default as > long as KNEM or CMA are available on the machine. > > George. > > > > > sort of flow control to > > avoid congestion ? > > If that is possible, what would be that parameter for vader ? > > > > Other than source code, is there any detailed documentation/studies of > > vader > > related parameters to improve the bandwidth at large message size ? I did > > see > > some documentation for sm, but not for vader. > > > > Thanks, > > Joshua > > > > > > ------ Original Message ------ > > Received: 03:06 PM CDT, 03/17/2017 > > From: George Bosilca > > To: Joshua Mora > > Cc: Open MPI Users > > Subject: Re: [OMPI users] tuning sm/vader for large messages > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 17, 2017 at 3:33 PM, Joshua Mora > > wrote: > > > > > > > Thanks for the quick reply. > > > > This test is between 2 cores that are on different CPUs. Say data has > > to > > > > traverse coherent fabric (eg. QPI,UPI, cHT). > > > > It has to go to main memory independently of cache size. Wrong > > assumption > > ? > > > > > > > > > > Depends on the usage pattern. Some benchmarks have options to clean/flush > > > the cache before each round of tests. > > > > > > > > > > Can data be evicted from cache and put into cache of second core on > > > > different > > > > CPU without placing it first in main memory ? > > > > > > > > > > It would depend on the memory coherency protocol. Usually it gets marked > > as > > > shared, and as a result it might not need to be pushed into main memory > > > right away. > > > > > > > > > > I am more thinking that there is a parameter that splits large messages > > in > > > > smaller ones at 64k or 128k ? > > > > > > > > > > Pipelining is not the answer to all situations. Once your messages are > > > larger than the caches, you already built memory pressure (by getting > > > outside the cache size) so the pipelining is bound by the memory > > bandwidth. > > > > > > > > > > > > > This seems (wrong assumption ?) like the kind of parameter I would need > > for > > > > large messages on a NIC. Coalescing data / large MTU,... > > > > > > > > > Sure, but there are hard limits imposed by the hardware, especially with > > > regards to intranode communications. Once you saturate the memory bus, > > you > > > hit a pretty hard limit. > > > > > > George. > > > > > > > > > > > > > > > > > Joshua > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------ Original Message ------ > > > > Received: 02:15 PM CDT, 03/17/2017 > > > > From: George Bosilca > > > > To: Open MPI Users > > > > > > > > Subject: Re: [OMPI users] tuning sm/vader for large messages > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Joshua, > > > > > > > > > > In shared memory the bandwidth depends on many parameters, including > > the > > > > > process placement and the size of the different cache levels. In your > > > > > particular case I guess after 128k you are outside the L2 cache (1/2 > > of > > > > the > > > > > cache in fact) and the bandwidth will drop as the data need to be > > flushed > > > > > to main memory. > > > > > > > > > > George. > > > > > > > > > > > > > > > > > > > > On Fri, Mar 17, 2017 at 1:47 PM, Joshua Mora > > > > wrote: > > > > > > > > > > > Hello, > > > > > > I am trying to get the max bw for shared memory communications > > using > > > > > > osu_[bw,bibw,mbw_mr] benchmarks. > > > > > > I am observing a peak at ~64k/128K msg size and then drops instead > > of > > > > > > sustaining it. > > > > > > What parameters or linux config do I need to add to default openmpi > > > > > > settings > > > > > > to get this improved ? > > > > > > I am already using vader and knem. > > > > > > > > > > > > See below one way bandwidth with peak at 64k. > > > > > > > > > > > > # Size Bandwidth (MB/s) > > > > > > 1 1.02 > > > > > > 2 2.13 > > > > > > 4 4.03 > > > > > > 8 8.48 > > > > > > 16 11.90 > > > > > > 32 23.29 > > > > > > 64 47.33 > > > > > > 128 88.08 > > > > > > 256 136.77 > > > > > > 512 245.06 > > > > > > 1024 263.79 > > > > > > 2048 405.49 > > > > > > 4096 1040.46 > > > > > > 8192 1964.81 > > > > > > 16384 2983.71 > > > > > > 32768 5705.11 > > > > > > 65536 7181.11 > > > > > > 131072 6490.55 > > > > > > 262144 4449.59 > > > > > > 524288 4898.14 > > > > > > 1048576 5324.45 > > > > > > 2097152 5539.79 > > > > > > 4194304 5669.76 > > > > > > > > > > > > Thanks, > > > > > > Joshua > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > users mailing list > > > > > > users@lists.open-mpi.org > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > users mailing list > > > > > users@lists.open-mpi.org > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users