Re: Map Reduce Sorting

2016-08-04 Thread Fabian Hueske
Hi, the Reducer-side sorting is done with an external merge-sort. The sorter collects records in an in-memory buffer until it is completely filled, sorts the buffer using quicksort, and spills the sorted result to disk (if available a combiner is applied before spilling to reduce IO). After all da

Re: Map Reduce Sorting

2016-08-03 Thread Hilmi Yildirim
Hi, I have another question. The reducer sorts its inputs before it starts with computation. Which sorting algorithm it is using? In Flink I found QuickSort, HeapSort and etc. Does the sorting algorithm benefit from pre-sorted partitions. For example, a MergeSort algorithm can sort the parti

Re: Map Reduce Sorting

2016-08-02 Thread Hilmi Yildirim
Hi Fabian, thank you very much! This answers my question. BR, Hilmi Am 01.08.2016 um 22:29 schrieb Fabian Hueske: Hi Hilmi, the results of the combiner are usually not completely sorted and if they are this property is not leveraged. This is due to the following reasons: 1) a sort-combiner

Re: Map Reduce Sorting

2016-08-01 Thread Fabian Hueske
Hi Hilmi, the results of the combiner are usually not completely sorted and if they are this property is not leveraged. This is due to the following reasons: 1) a sort-combiner only sorts as much data as fits into memory. If there is more data, the result consists of multiple sorted sequences. 2)

Map Reduce Sorting

2016-08-01 Thread Hilmi Yildirim
Hi, I have a question regarding when data points are sorted when applying a simple Map Reduce Job. I have the following code: data = readFromSource() data.map().groupBy(0).reduce(...) This code will be translated into the following execution plan: map -> combiner -> hash partitioning a