Hi,
the Reducer-side sorting is done with an external merge-sort. The sorter
collects records in an in-memory buffer until it is completely filled,
sorts the buffer using quicksort, and spills the sorted result to disk (if
available a combiner is applied before spilling to reduce IO). After all
da
Hi,
I have another question. The reducer sorts its inputs before it starts
with computation. Which sorting algorithm it is using? In Flink I found
QuickSort, HeapSort and etc.
Does the sorting algorithm benefit from pre-sorted partitions. For
example, a MergeSort algorithm can sort the parti
Hi Fabian,
thank you very much! This answers my question.
BR,
Hilmi
Am 01.08.2016 um 22:29 schrieb Fabian Hueske:
Hi Hilmi,
the results of the combiner are usually not completely sorted and if they
are this property is not leveraged.
This is due to the following reasons:
1) a sort-combiner
Hi Hilmi,
the results of the combiner are usually not completely sorted and if they
are this property is not leveraged.
This is due to the following reasons:
1) a sort-combiner only sorts as much data as fits into memory. If there is
more data, the result consists of multiple sorted sequences.
2)
Hi,
I have a question regarding when data points are sorted when applying a
simple Map Reduce Job.
I have the following code:
data = readFromSource()
data.map().groupBy(0).reduce(...)
This code will be translated into the following execution plan:
map -> combiner -> hash partitioning a