[ https://issues.apache.org/jira/browse/FLINK-11909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rong Rong updated FLINK-11909: ------------------------------ Description: Currently Flink AsyncIO by default fails the entire job when async function invoke fails [1]. It would be nice to have some default Async IO failure/timeout handling strategy, or opens up some APIs for AsyncFunction timeout method to interact with the AsyncWaitOperator. For example (quote [~suez1224]) : * FAIL_OPERATOR (default & current behavior) * FIX_INTERVAL_RETRY (retry with configurable fixed interval up to N times) * EXP_BACKOFF_RETRY (retry with exponential backoff up to N times) Discussion also extended to introduce configuration such as: * MAX_RETRY_COUNT * RETRY_FAILURE_POLICY REF: [1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/asyncio.html#timeout-handling [2] was: Currently Flink AsyncIO by default fails the entire job when async function invoke fails [1]. It would be nice to have some default Async IO failure/timeout handling strategy, or opens up some APIs for AsyncFunction timeout method to interact with the AsyncWaitOperator. For example (quote [~suez1224]) : * FAIL_OPERATOR (default & current behavior) * FIX_INTERVAL_RETRY (retry with configurable fixed interval up to N times) * EXP_BACKOFF_RETRY (retry with exponential backoff up to N times) Discussion also extended to introduce configuration such as: * MAX_RETRY_COUNT * RETRY_FAILURE_POLICY REF: [1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/asyncio.html#timeout-handling > Provide default failure/timeout handling strategy for AsyncIO functions > ----------------------------------------------------------------------- > > Key: FLINK-11909 > URL: https://issues.apache.org/jira/browse/FLINK-11909 > Project: Flink > Issue Type: Improvement > Components: API / DataStream > Reporter: Rong Rong > Assignee: Rong Rong > Priority: Major > > Currently Flink AsyncIO by default fails the entire job when async function > invoke fails [1]. It would be nice to have some default Async IO > failure/timeout handling strategy, or opens up some APIs for AsyncFunction > timeout method to interact with the AsyncWaitOperator. For example (quote > [~suez1224]) : > * FAIL_OPERATOR (default & current behavior) > * FIX_INTERVAL_RETRY (retry with configurable fixed interval up to N times) > * EXP_BACKOFF_RETRY (retry with exponential backoff up to N times) > Discussion also extended to introduce configuration such as: > * MAX_RETRY_COUNT > * RETRY_FAILURE_POLICY > REF: > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/asyncio.html#timeout-handling > [2] -- This message was sent by Atlassian JIRA (v7.6.3#76005)