[ https://issues.apache.org/jira/browse/FLINK-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruidong Li updated FLINK-12405: ------------------------------- Description: The new {{ResultPartitionType}} : {{BLOCKING_PERSISTENT}} is similar to {{BLOCKING}} except it might be consumed for several times and will be released after TM shutdown or {{ResultPartition}} removal request. This is the basis for Interactive Programming. Here is the brief changes: * Introduce {{ResultPartitionType}} called {{BLOCKING_PERSISTENT}} * Introduce {{BlockingShuffleOutputFormat}} which contains a user specified {{IntermediateDataSetID}}(passed from TableAPI in later PR) * when {{JobGraphGenerator}} sees a {{GenericDataSinkBase}} with {{BlockingShuffleOutputFormat}}, it creates a {{IntermediateDataSet}} with this id, then add it to its predecessor, the {{OutputFormatVertex}} for this {{GenericDataSinkBase}} will be excluded in {{JobGraph}} * So the JobGraph may contains some JobVertex which has more {{IntermediateDataSet}} than its downstream consumers. Here are some design notes: * Why modify {{DataSet}} and {{JobGraphGenerator}} Since Blink Planner is not ready yet, and Batch Table is running on Flink Planner(based on DataSet). There will be another implementation once Blink Planner is ready. * Why use a special {{OutputFormat}} as placeholder We could add a {{cache()}} method for DataSet, but we do not want to change DataSet API any more. so a special {{OutputFormat}} as placeholder seems reasonable. was: The new {{ResultPartitionType}} : {{BLOCKING_PERSISTENT}} is similar to {{BLOCKING}} except it might be consumed for several times and will be released after TM shutdown or {{ResultPartition}} removal request. This is the basis for Interactive Programming. Here is the brief changes: * Introduce {{ResultPartitionType}} called {{BLOCKING_PERSISTENT}} * Introduce {{BlockingShuffleOutputFormat}} and {{BlockingShuffleDataSink}} which are used to generate {{GenericDataSinkBase}} with user specified {{IntermediateDataSetID}} (passed from TableAPI in later PR) * when {{JobGraphGenerator}} sees a {{GenericDataSinkBase}} with {{IntermediateDataSetID}}, it creates a {{IntermediateDataSet}} with this id, then {{OutputFormatVertex}} for this {{GenericDataSinkBase}} will be excluded in {{JobGraph}} * So the JobGraph may contains some JobVertex which has more {{IntermediateDataSet}} than its downstream consumers. Here are some design notes: * Why modify {{DataSet}} and {{JobGraphGenerator}} Since Blink Planner is not ready yet, and Batch Table is running on Flink Planner(based on DataSet). There will be another implementation once Blink Planner is ready. * Why use a special {{OutputFormat}} as placeholder We could add a {{cache()}} method for DataSet, but we do not want to change DataSet API any more. so a special {{OutputFormat}} as placeholder seems reasonable. > Introduce BLOCKING_PERSISTENT result partition type > --------------------------------------------------- > > Key: FLINK-12405 > URL: https://issues.apache.org/jira/browse/FLINK-12405 > Project: Flink > Issue Type: Sub-task > Components: API / DataSet > Reporter: Ruidong Li > Assignee: Ruidong Li > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The new {{ResultPartitionType}} : {{BLOCKING_PERSISTENT}} is similar to > {{BLOCKING}} except it might be consumed for several times and will be > released after TM shutdown or {{ResultPartition}} removal request. > This is the basis for Interactive Programming. > Here is the brief changes: > * Introduce {{ResultPartitionType}} called {{BLOCKING_PERSISTENT}} > * Introduce {{BlockingShuffleOutputFormat}} which contains a user specified > {{IntermediateDataSetID}}(passed from TableAPI in later PR) > * when {{JobGraphGenerator}} sees a {{GenericDataSinkBase}} with > {{BlockingShuffleOutputFormat}}, it creates a {{IntermediateDataSet}} with > this id, then add it to its predecessor, the {{OutputFormatVertex}} for this > {{GenericDataSinkBase}} will be excluded in {{JobGraph}} > * So the JobGraph may contains some JobVertex which has more > {{IntermediateDataSet}} than its downstream consumers. > Here are some design notes: > * Why modify {{DataSet}} and {{JobGraphGenerator}} > Since Blink Planner is not ready yet, and Batch Table is running on Flink > Planner(based on DataSet). > There will be another implementation once Blink Planner is ready. > * Why use a special {{OutputFormat}} as placeholder > We could add a {{cache()}} method for DataSet, but we do not want to change > DataSet API any more. so a special {{OutputFormat}} as placeholder seems > reasonable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)