If at certain x msg size you achieve X performance (MB/s) and at 2x msg size or higher you achieve Y performance, being Y significantly lower than X, is it possible to have a parameter that chops messages internally to x size in order to sustain X performance rather than let it choke ? sort of flow control to avoid congestion ? If that is possible, what would be that parameter for vader ?
Other than source code, is there any detailed documentation/studies of vader related parameters to improve the bandwidth at large message size ? I did see some documentation for sm, but not for vader. Thanks, Joshua ------ Original Message ------ Received: 03:06 PM CDT, 03/17/2017 From: George Bosilca <bosi...@icl.utk.edu> To: Joshua Mora <joshua_m...@usa.net> Cc: Open MPI Users <users@lists.open-mpi.org> Subject: Re: [OMPI users] tuning sm/vader for large messages > On Fri, Mar 17, 2017 at 3:33 PM, Joshua Mora <joshua_m...@usa.net> wrote: > > > Thanks for the quick reply. > > This test is between 2 cores that are on different CPUs. Say data has to > > traverse coherent fabric (eg. QPI,UPI, cHT). > > It has to go to main memory independently of cache size. Wrong assumption ? > > > > Depends on the usage pattern. Some benchmarks have options to clean/flush > the cache before each round of tests. > > > > Can data be evicted from cache and put into cache of second core on > > different > > CPU without placing it first in main memory ? > > > > It would depend on the memory coherency protocol. Usually it gets marked as > shared, and as a result it might not need to be pushed into main memory > right away. > > > > I am more thinking that there is a parameter that splits large messages in > > smaller ones at 64k or 128k ? > > > > Pipelining is not the answer to all situations. Once your messages are > larger than the caches, you already built memory pressure (by getting > outside the cache size) so the pipelining is bound by the memory bandwidth. > > > > > This seems (wrong assumption ?) like the kind of parameter I would need for > > large messages on a NIC. Coalescing data / large MTU,... > > > Sure, but there are hard limits imposed by the hardware, especially with > regards to intranode communications. Once you saturate the memory bus, you > hit a pretty hard limit. > > George. > > > > > > > Joshua > > > > > > > > > > > > > > > > > > ------ Original Message ------ > > Received: 02:15 PM CDT, 03/17/2017 > > From: George Bosilca <bosi...@icl.utk.edu> > > To: Open MPI Users <users@lists.open-mpi.org> > > > > Subject: Re: [OMPI users] tuning sm/vader for large messages > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Joshua, > > > > > > In shared memory the bandwidth depends on many parameters, including the > > > process placement and the size of the different cache levels. In your > > > particular case I guess after 128k you are outside the L2 cache (1/2 of > > the > > > cache in fact) and the bandwidth will drop as the data need to be flushed > > > to main memory. > > > > > > George. > > > > > > > > > > > > On Fri, Mar 17, 2017 at 1:47 PM, Joshua Mora <joshua_m...@usa.net> > > wrote: > > > > > > > Hello, > > > > I am trying to get the max bw for shared memory communications using > > > > osu_[bw,bibw,mbw_mr] benchmarks. > > > > I am observing a peak at ~64k/128K msg size and then drops instead of > > > > sustaining it. > > > > What parameters or linux config do I need to add to default openmpi > > > > settings > > > > to get this improved ? > > > > I am already using vader and knem. > > > > > > > > See below one way bandwidth with peak at 64k. > > > > > > > > # Size Bandwidth (MB/s) > > > > 1 1.02 > > > > 2 2.13 > > > > 4 4.03 > > > > 8 8.48 > > > > 16 11.90 > > > > 32 23.29 > > > > 64 47.33 > > > > 128 88.08 > > > > 256 136.77 > > > > 512 245.06 > > > > 1024 263.79 > > > > 2048 405.49 > > > > 4096 1040.46 > > > > 8192 1964.81 > > > > 16384 2983.71 > > > > 32768 5705.11 > > > > 65536 7181.11 > > > > 131072 6490.55 > > > > 262144 4449.59 > > > > 524288 4898.14 > > > > 1048576 5324.45 > > > > 2097152 5539.79 > > > > 4194304 5669.76 > > > > > > > > Thanks, > > > > Joshua > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > users mailing list > > > > users@lists.open-mpi.org > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > users mailing list > > > users@lists.open-mpi.org > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users