bug#46169: Parallelize merge sort

Ole Tange Fri, 29 Jan 2021 01:10:57 -0800

During the past 6 months I have been using GNU sort on a lot of files in
the GByte range on my 64 core machine. I have discovered, that GNU sort
is bad at using all the cores.


I have built parsort (now distributed as part of GNU parallel) as a
small wrapper around GNU sort and it performs 2-3x better than GNU sort
for files of this size on this machine.

The first step works great in GNU sort: Here you can really use all
your cores. But after the first step, it seems GNU sort does a merge
sort of all the results. And this pegs a single CPU at 100% while the
rest of the cores are pretty lazy.

It seems the merging is not parallelized.

parsort does that a little better than GNU sort. Instead of doing a
k-way merge at the final step, it does a 2-way merge on each core. It
still has to a 2-way merge of all data on a single core in the end, so
it is not even close to optimal.

Wikipedia describes a way of doing parallel merge:

https://en.wikipedia.org/wiki/Merge_sort#Parallel_multiway_merge_sort

I am no C-coder, and I respect that we want coreutils to be written in
C. Unfortunately that means I cannot really help doing the actual
coding (at least not in a way that would not introduce a plethora of
bugs).

But looking at the pseudocode
https://en.wikipedia.org/wiki/Merge_sort#Pseudocode it does not look
that hard to implement.

This method requires all data to fit in memory, but there are even
more advanced methods (such as
https://www.sciencedirect.com/science/article/pii/S0304397500002875)
that would not require this.

Could you consider implementing a parallel merge, so I can retire
parsort?

(Also: Thanks for maintaining a vital piece of software).

/Ole

bug#46169: Parallelize merge sort

Reply via email to