[ https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
JIN SUN updated FLINK-10205: ---------------------------- Description: Today DataSource Task pull InputSplits from JobManager to achieve better performance, however, when a DataSourceTask failed and rerun, it will not get the same splits as its previous version. this will introduce inconsistent result or even data corruption. Furthermore, if there are two executions run at the same time (in batch scenario), this two executions should process same splits. we need to fix the issue to make the inputs of a DataSourceTask deterministic. The propose is save all splits into ExecutionVertex and DataSourceTask will pull split from there. document: [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing] was: Today DataSource Task pull InputSplits from JobManager to achieve better performance, however, when a DataSourceTask failed and rerun, it will not get the same splits as its previous version. this will introduce inconsistent result or even data corruption. Furthermore, if there are two executions run at the same time (in batch scenario), this two executions should process same splits. we need to fix the issue to make the inputs of a DataSourceTask deterministic. The propose is save all splits into ExecutionVertex and DataSourceTask will pull split from there. > Batch Job: InputSplit Fault tolerant for DataSourceTask > ------------------------------------------------------- > > Key: FLINK-10205 > URL: https://issues.apache.org/jira/browse/FLINK-10205 > Project: Flink > Issue Type: Sub-task > Components: JobManager > Affects Versions: 1.6.1, 1.6.2 > Reporter: JIN SUN > Assignee: JIN SUN > Priority: Major > Labels: pull-request-available > Original Estimate: 168h > Remaining Estimate: 168h > > Today DataSource Task pull InputSplits from JobManager to achieve better > performance, however, when a DataSourceTask failed and rerun, it will not get > the same splits as its previous version. this will introduce inconsistent > result or even data corruption. > Furthermore, if there are two executions run at the same time (in batch > scenario), this two executions should process same splits. > we need to fix the issue to make the inputs of a DataSourceTask > deterministic. The propose is save all splits into ExecutionVertex and > DataSourceTask will pull split from there. > document: > [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v7.6.3#76005)