Re: Questions re: ExecutionGraph & ResultPartitions for interactive use a la Spark

Aljoscha Krettek Tue, 05 Jan 2016 02:55:28 -0800

Hi,
these are certainly valid use cases. As far is I know, the people who know most 
in this area are on vacation right now. They should be back in a week, I think.


They should be able to give you a proper description of the current situation 
and some pointers.

Cheers,
Aljoscha
> On 04 Jan 2016, at 22:47, kovas boguta <kovas.bog...@gmail.com> wrote:
> 
> I'm impressed with the Flink API, it seems simpler and more composable than 
> what I've seen elsewhere.
> 
> I'm trying to see how to achieve a more interactive, REPL-driven experience, 
> similar to Spark. I'm consuming Flink from Clojure. 
> 
> For now I'm only interested in smaller clusters & interactive usage, so the 
> failure recovery aspect of RDDs is less important relative to simply caching 
> intermediate results (in this context, keeping the ResultPartitions 
> partitions around even when theres no consuming tasks) , and dynamically 
> extending the jobgraph. 
> 
> Three questions:
> 
> 1) How can I prevent ResultPartitions from being released?
> 
> In interactive use, RPs should not necessarily be released when there are no 
> pending tasks to consume them. 
> 
> Looking at the code, its hard to understand the high-level logic of who 
> triggers their release, how the refcounting works, etc. For instance, is 
> releasePartitionsProducedBy called by the producer, or the consumer, or both? 
> Is the refcount initialized at the beginning of the task setup, and 
> decremented every time its read out? 
> 
> Ideally, I could force certain ResultPartitions to only be manually 
> releasable, so I can consume them over and over.  
> 
> 
> 2) Can I call attachJobGraph on the ExecutionGraph after the job has begun 
> executing to add more nodes?
> 
> I read that Flink does not support changing the running the topology. But 
> what about extending the topology to add more nodes? 
> 
> If the IntermediateResultPartition are just sitting around from previously 
> completely tasks, this seems straightforward in principle.. would one have to 
> fiddle with the scheduler/event system to kick things off again?
> 
> 3) Can I have a ConnectionManager without a TaskManager?
> 
> For interactive use, I want to selectively pull data from ResultPartitions 
> into my local REPL process. But I don't want my local process to be a 
> TaskManager node, and have tasks assigned to it. 
> 
> So this question boils down to, how to hook a ConnectionManager into the 
> Flink communication/actor system?
> 
> Thanks!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Questions re: ExecutionGraph & ResultPartitions for interactive use a la Spark

Reply via email to