[ https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653116#comment-16653116 ]
ASF GitHub Bot commented on FLINK-10205: ---------------------------------------- TisonKun commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault tolerant for DataSource… URL: https://github.com/apache/flink/pull/6684#issuecomment-430519357 @tillrohrmann I would recur what @isunjin and @wenlong88 emphasize: > @wenlong88: the framework do not know how the input assigner assign splits to subtask and we can't reconstruct the assigner in region failover > @isunjin: otherwise the logic to make data consistent is complicated For implementation, we can record the input split assigned to the task on assigning, and this is the most general way. We can, conceptually, think as if the input split be processed by any one task, but technically, @isunjin 's implementation is a concise one. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch Job: InputSplit Fault tolerant for DataSourceTask > ------------------------------------------------------- > > Key: FLINK-10205 > URL: https://issues.apache.org/jira/browse/FLINK-10205 > Project: Flink > Issue Type: Sub-task > Components: JobManager > Affects Versions: 1.6.1, 1.7.0, 1.6.2 > Reporter: JIN SUN > Assignee: JIN SUN > Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > Original Estimate: 168h > Remaining Estimate: 168h > > Today DataSource Task pull InputSplits from JobManager to achieve better > performance, however, when a DataSourceTask failed and rerun, it will not get > the same splits as its previous version. this will introduce inconsistent > result or even data corruption. > Furthermore, if there are two executions run at the same time (in batch > scenario), this two executions should process same splits. > we need to fix the issue to make the inputs of a DataSourceTask > deterministic. The propose is save all splits into ExecutionVertex and > DataSourceTask will pull split from there. > document: > [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v7.6.3#76005)