Re: Gather a distributed dataset

Ufuk Celebi Thu, 15 Jan 2015 02:30:47 -0800

On 13 Jan 2015, at 16:50, Stephan Ewen <se...@apache.org> wrote:

> Hi!
> 
> To follow up on what Ufuk explaned:
> 
> - Ufuk is right, the problem is not getting the data set.
> https://github.com/apache/flink/pull/210 does that for anything that is not
> too gigantic, which is a good start. I think we should merge this as soon
> as we agree on the signature and names of the API methods. We can swap the
> internal realization for something more robust later.
> 
> - For anything that just issues a program and wants the result back, this
> is actually perfectly fine.
> 
> - For true interactive programs, we need to back track to intermediate
> results (rather than to the source) to avoid re-executing large parts. This
> is the biggest missing piece, next to the persistent materialization of
> intermediate results (Ufuk is working on this). The logic is the same as
> for fault tolerance, so it is part of that development.
> 
> @alexander: I want to create the feature branch for that on Thursday. Are
> you interested in contributing to that feature?
> 
> - For streaming results continuously back, we need another mechanism than
> the accumulators. Let's create a design doc or thread an get working on
> that. Probably involves adding another set of akka messages from TM -> JM
> -> Client. Or something like an extension to the BLOB manager for streams?


For streaming results back, we can use the same mechanisms used by the task 
managers. Let me add documentation (FLINK-1373) for the network stack this week.

Re: Gather a distributed dataset

Reply via email to