Re: Order groups by their keys

2015-07-19 Thread hagersaleh
why not found Although use library .sortPartition(1, Order.ASCENDING) -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Order-groups-by-their-keys-tp2056p2153.html Sent from the Apache Flink User Mailing List archive. mailing list archive at

Re: Order groups by their keys

2015-07-15 Thread Fabian Hueske
Yes, going to parallelism 1 is another option but you don't have to use a fake-reduce to enforce sorting. You can simply do: DataSet> result = ... result .sortPartition(1, Order.ASCENDING).setParallelism(1) // sort on first String field .output(...); Fabian 2015-07-15 15:32 GMT+02:00 Matthia

Re: Order groups by their keys

2015-07-15 Thread Matthias J. Sax
Hi Robert, global sorting of the final output is currently no supported by Flink out-of-the-box. The reason is, that a global sort requires all data to be processed by a single node (what contradicts data parallelism). For small output, you could use a final "reduce" with no key (ie, all data go

Re: Order groups by their keys

2015-07-15 Thread Fabian Hueske
Hi Robert, there are two issues involved here. 1) Flink does not support totally ordered paralllel output out-of-the box. Fully sorting data in parallel requires range partitioning which requires some knowledge of the data (distribution of the key values) to produce balanced partitions. Flink doe

Order groups by their keys

2015-07-15 Thread Robert Schmidtke
Hey everyone, I'm currently trying to implement TPC-H Q1 and that involves ordering of results. Now I'm not too familiar with the transformations yet, however for the life of me I cannot figure out how to get it to work. Consider the following toy example: final ExecutionEnvironment env = Executi