RE: Gather a distributed dataset

2015-01-27 Thread Kruse, Sebastian
Message- From: Ufuk Celebi [mailto:u...@apache.org] Sent: Montag, 12. Januar 2015 12:06 To: dev@flink.apache.org Subject: Re: Gather a distributed dataset Hey Alexander, On 12 Jan 2015, at 11:42, Alexander Alexandrov wrote: > Hi there, > > I wished for intermediate datasets, and S

Re: Gather a distributed dataset

2015-01-16 Thread Alexander Alexandrov
Thanks, I will have a look at your comments tomorrow and create a PR which should superseed 210. BTW, is there already a test case where I can see the suggested way to do staged execution in with the new ExecutionEnvironment API? I thought about your second remark as well. The following lines pitc

Re: Gather a distributed dataset

2015-01-16 Thread Stephan Ewen
@Alex That sounds great. I added a few inline comments to PR 210 and then it is good to merge. If you want, feel free to fix it up and we will merge it. Feel free to also add (or suggest and stub) more of such functions. Is that what you meant by "designing interfaces" ? Here is a thought that cr

Re: Gather a distributed dataset

2015-01-15 Thread Ufuk Celebi
On 13 Jan 2015, at 16:50, Stephan Ewen wrote: > Hi! > > To follow up on what Ufuk explaned: > > - Ufuk is right, the problem is not getting the data set. > https://github.com/apache/flink/pull/210 does that for anything that is not > too gigantic, which is a good start. I think we should merge

Re: Gather a distributed dataset

2015-01-15 Thread Alexander Alexandrov
@Stephan: yes, I would like to contribute (e.g. I can design the interfaces and merge 210). Please reply with more information once you have the branch, I can find some time for that next week (on the expense of FLINK-1347 which hopefully can wait

Re: Gather a distributed dataset

2015-01-13 Thread Stephan Ewen
Hi! To follow up on what Ufuk explaned: - Ufuk is right, the problem is not getting the data set. https://github.com/apache/flink/pull/210 does that for anything that is not too gigantic, which is a good start. I think we should merge this as soon as we agree on the signature and names of the AP

Re: Gather a distributed dataset

2015-01-12 Thread Alexander Alexandrov
Thanks, I am currently looking at the new ExecutionEnvironment API. > I think Stephan is working on the scheduling to support this kind of programs. @Stephan: is there a feature branch for that somewhere? 2015-01-12 12:05 GMT+01:00 Ufuk Celebi : > Hey Alexander, > > On 12 Jan 2015, at 11:42, Al

Re: Gather a distributed dataset

2015-01-12 Thread Ufuk Celebi
Hey Alexander, On 12 Jan 2015, at 11:42, Alexander Alexandrov wrote: > Hi there, > > I wished for intermediate datasets, and Santa Ufuk made my wishes come true > (thank you, Santa)! > > Now that FLINK-986 is in the mainline, I want to ask some practical > questions. > > In Spark, there is a

RE: Gather a distributed dataset

2015-01-12 Thread Paris Carbone
exand...@gmail.com] Sent: Monday, January 12, 2015 11:42 AM To: dev@flink.apache.org Subject: Gather a distributed dataset Hi there, I wished for intermediate datasets, and Santa Ufuk made my wishes come true (thank you, Santa)! Now that FLINK-986 is in the mainline, I want to ask some practical ques

Gather a distributed dataset

2015-01-12 Thread Alexander Alexandrov
Hi there, I wished for intermediate datasets, and Santa Ufuk made my wishes come true (thank you, Santa)! Now that FLINK-986 is in the mainline, I want to ask some practical questions. In Spark, there is a way to put a value from the local driver to the distributed runtime via val x = env.paral