Re: DataSet: CombineHint heuristics

2017-09-05 Thread Urs Schoenenberger
Hi Gábor, thank you very much for your explanation, that makes a lot of sense. Best regards, Urs On 05.09.2017 14:32, Gábor Gévay wrote: > Hi Urs, > > Yes, the 1/10th ratio is just a very loose rule of thumb. I would > suggest to try both the SORT and HASH strategies with a workload that > is a

Re: DataSet: CombineHint heuristics

2017-09-05 Thread Gábor Gévay
Hi Urs, Yes, the 1/10th ratio is just a very loose rule of thumb. I would suggest to try both the SORT and HASH strategies with a workload that is as similar as possible to your production workload (similar data, similar parallelism, etc.), and see which one is faster for your specific use case.

Re: DataSet: CombineHint heuristics

2017-08-31 Thread Aljoscha Krettek
Hi, I would say that your assumption is correct and that the COMBINE strategy does in fact also depend on the ration " #total records/#records that fit into a single Sorter/Hashtable". I'm CC'ing Fabian, just to be sure. He knows that stuff better than I do. Best, Aljoscha > On 31. Aug 2017,

DataSet: CombineHint heuristics

2017-08-31 Thread Urs Schoenenberger
Hi all, I was wondering about the heuristics for CombineHint: Flink uses SORT by default, but the doc for HASH says that we should expect it to be faster if the number of keys is less than 1/10th of the number of records. HASH should be faster if it is able to combine a lot of records, which hap