Hi Padraig,
> From: Chen Guo <[email protected]> > > Any thoughts on the interesting jump at T=8? > Say we're sorting 32 lines with 8 threads, each thread would get 4 lines to > sort. If we sort with 7 threads, then 6 threads would get 4 lines, and the > last > thread would get 8 to sort. Thus, this last thread becomes kind of a > bottleneck. > > > A way around this would be, if sorting with 7 threads, have 6 threads sort 5 > lines and the last thread sort 2. A more "wow" example might be 1000 lines > with > 3 threads... We could have 250, 250, and 500, with 500 being the bottleneck, > or > 333, 333, and 334. > > To divide threads up this way, we'd need to at the very start do nlines / > nthreads for all the threads except 1, and nlines - (nthreads - 1) * (nlines > / > nthreads) for the last thread. However, this method implies creating all the > threads in a loop, which isn't as elegant as recursion. I've used this > approach > for a previous patch, but for some reason never thought of it here. I'll try > it > out and see how much the results differ. > I totally wasnt thinking straight. By nature this is a binary merge, coded into the very structures we used, thus making this style of division division very difficult to do elegantly. I think as long as the merge tree is binary, which is to say as long as standard mergesort is used (Glen's patch show the same jump), this jump will exist.
