There is already an ongoing discussion and an issue open about that: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Gather-a-distributed-dataset-td3216.html
I am sadly currently time-pressed with other things, but if nobody else handles this, I expect to be able to work on that within two weeks. Regards, Alex 2015-01-28 10:46 GMT+01:00 John Sandiford (JIRA) <j...@apache.org>: > John Sandiford created FLINK-1459: > ------------------------------------- > > Summary: Collect DataSet to client > Key: FLINK-1459 > URL: https://issues.apache.org/jira/browse/FLINK-1459 > Project: Flink > Issue Type: Improvement > Reporter: John Sandiford > > > Hi, I may well have missed something obvious here but I cannot find an > easy way to extract the values in a DataSet to the client. Spark has > collect, collectAsMap etc... > > (I need to pass the values from a small aggregated DataSet back to a > machine learning library which is controlling the iterations.) > > The only way I could find to do this was to implement my own in memory > OutputFormat. This is not ideal, but does work. > > Many thanks, John > > > > val env = ExecutionEnvironment.getExecutionEnvironment > > val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) > > val result = data.reduce((a, b) => a) > val valuesOnClient = result.??? > > env.execute("Simple example") > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >