[ https://issues.apache.org/jira/browse/FLINK-37639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Selman Kayrancioglu updated FLINK-37639: ---------------------------------------- Description: KeyedStateBootstrapFunction hangs if an exception is thrown within When an exception occurs within `KeyedStateBootstrapFunction<>.processElements`, the job fails to terminate properly. Instead of failing with an error, the operator appears to hang indefinitely when viewed from the UI or monitoring tools. I've created a minimal reproducer demonstrating this issue at: [https://github.com/seruman/flink-bootstrap-function-hangs-on-exception-reproducer] This behavior is consistent across multiple environments - I've confirmed it occurs when: - Running locally with `start-cluster.sh` and `flink run ...` - Deploying to Kubernetes with flink-kubernetes-operator - Executing unit tests with `MiniCluster` (as shown in the repository) I would expect it to to fail with the appropriate exception rather than becoming unresponsive. Notably, there were no error logs generated in either the JobManager or TaskManager. So far I've tried versions 1.19.1, 1.19.2, 1.20.0. Sample config; ``` pipeline.max-parallelism=10 parallelism.default=2 execution.runtime-mode=BATCH execution.batch-shuffle-mode=ALL_EXCHANGES_PIPELINED jobmanager.scheduler=Default ``` Please let me know if you need any additional information. was: KeyedStateBootstrapFunction hangs if an exception is thrown within When an exception occurs within `KeyedStateBootstrapFunction<>.processElements`, the job fails to terminate properly. Instead of failing with an error, the operator appears to hang indefinitely when viewed from the UI or monitoring tools. I've created a minimal reproducer demonstrating this issue at: https://github.com/seruman/flink-bootstrap-function-hangs-on-exception-reproducer This behavior is consistent across multiple environments - I've confirmed it occurs when: - Running locally with `start-cluster.sh` - Deploying to Kubernetes with flink-kubernetes-operator - Executing unit tests with `MiniCluster` (as shown in the repository) I would expect it to to fail with the appropriate exception rather than becoming unresponsive. Notably, there were no error logs generated in either the JobManager or TaskManager. So far I've tried versions 1.19.1, 1.19.2, 1.20.0. Sample config; ``` pipeline.max-parallelism=10 parallelism.default=2 execution.runtime-mode=BATCH execution.batch-shuffle-mode=ALL_EXCHANGES_PIPELINED jobmanager.scheduler=Default ``` Please let me know if you need any additional information. > KeyedStateBootstrapFunction hangs if an exception is thrown within > ------------------------------------------------------------------ > > Key: FLINK-37639 > URL: https://issues.apache.org/jira/browse/FLINK-37639 > Project: Flink > Issue Type: Bug > Components: API / State Processor > Affects Versions: 1.20.0, 1.19.1, 1.19.2 > Reporter: Selman Kayrancioglu > Priority: Major > > KeyedStateBootstrapFunction hangs if an exception is thrown within > When an exception occurs within > `KeyedStateBootstrapFunction<>.processElements`, the job fails to terminate > properly. Instead of failing with an error, the operator appears to hang > indefinitely when viewed from the UI or monitoring tools. > I've created a minimal reproducer demonstrating this issue at: > [https://github.com/seruman/flink-bootstrap-function-hangs-on-exception-reproducer] > This behavior is consistent across multiple environments - I've confirmed it > occurs when: > - Running locally with `start-cluster.sh` and `flink run ...` > - Deploying to Kubernetes with flink-kubernetes-operator > - Executing unit tests with `MiniCluster` (as shown in the repository) > I would expect it to to fail with the appropriate exception rather than > becoming unresponsive. > Notably, there were no error logs generated in either the JobManager or > TaskManager. > So far I've tried versions 1.19.1, 1.19.2, 1.20.0. > Sample config; > ``` > pipeline.max-parallelism=10 > parallelism.default=2 > execution.runtime-mode=BATCH > execution.batch-shuffle-mode=ALL_EXCHANGES_PIPELINED > jobmanager.scheduler=Default > ``` > Please let me know if you need any additional information. -- This message was sent by Atlassian Jira (v8.20.10#820010)