Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Ufuk Celebi Wed, 13 Jul 2016 03:13:03 -0700

Thanks for this very first proposal! Both the proposed functionality
and the way you explained it are super nice. :-)


I think that this has been long overdue in Flink. :-) Having worked on
both the ExecutionGraph and IntermediateResults before, I agree that
these are the relevant components for this change.

Version 1:

- Conceptually I agree that this is the way to go. I think it's
relatively straight forward to do this as you describe (minus all the
surprises during implementation ;-))
- Very nice explanation with the figures!
- Since FLIPs will probably also function as documentation, we might
link to the nice figures in [1] for people who are not familiar with
the details of the ExecutionGraph.

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html#jobmanager-data-structures

Version 2:

- I think that the changes to the intermediate results and pinning
will be straight forward.
- An important follow up for this (probably another FLIP?) will be how
we do memory management though. Right now the buffers for the
intermediate results come from the "network buffer pool", which is by
default very small (64MB). This is not a blocker for the
implementation of Version 2, but probably for a good user experience.
;-)

Public API changes:

- RestartStrategy: I would expect this to be interpreted as
maximum-total-task failures

– Ufuk


On Wed, Jul 13, 2016 at 8:20 AM, Aljoscha Krettek <aljos...@apache.org> wrote:
> I added a FLIP document in the wiki:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
>
> For now, this contains the link to the Google Doc and a link to this
> discussion thread. Once a Jira is created for this it should also be added
> there.
>
> On Tue, 12 Jul 2016 at 20:11 Chesnay Schepler <ches...@apache.org> wrote:
>
>> shouldn't the proposal be contained in the wiki instead of GoogleDocs?
>>
>> On 12.07.2016 19:55, Stephan Ewen wrote:
>> > Hi all!
>> >
>> > Here is the very first FLIP (FLink Improvement Proposal): Fine grained
>> > recovery from task failures
>> >
>> > It describes a proposed enhancement for reducing the work done during
>> > recovery.
>> >
>> >
>> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
>> >
>> > Please comment in this mail thread, or in the GoogleDoc.
>> >
>> > Best,
>> > Stephan
>> >
>>
>>

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Reply via email to