[ https://issues.apache.org/jira/browse/FLINK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-20886: ----------------------------------- Labels: auto-deprioritized-major auto-deprioritized-minor pull-request-available stale-assigned usability (was: auto-deprioritized-major auto-deprioritized-minor stale-assigned usability) > Add the option to get a threaddump on checkpoint timeouts > --------------------------------------------------------- > > Key: FLINK-20886 > URL: https://issues.apache.org/jira/browse/FLINK-20886 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Reporter: Nico Kruber > Assignee: Zakelly Lan > Priority: Minor > Labels: auto-deprioritized-major, auto-deprioritized-minor, > pull-request-available, stale-assigned, usability > > For debugging checkpoint timeouts, I was thinking about the following > addition to Flink: > When a checkpoint times out and the async thread is still running, create a > thread dump [1] and either add this to the checkpoint stats, log it, or write > it out. > This may help identifying where the checkpoint is stuck (maybe a lock, could > also be in a third party lib like the FS connectors,...). It would give us > some insights into what the thread is currently doing. > Limiting the scope of the threads would be nice but may not be possible in > the general case since additional threads (spawned by the FS connector lib, > or otherwise connected) may interact with the async thread(s) by e.g. going > through the same locks. Maybe we can reduce the thread dumps to all async > threads of the failed checkpoint + all thready that interact with it, e.g. > via locks? > I'm also not sure whether the ability to have thread dumps or not should be > user-configurable (Could it contain sensitive information from other jobs if > you run a session cluster? Is that even relevant since we don't give > isolation guarantees anyway?). If it is configurable, it should be on by > default. > [1] https://crunchify.com/how-to-generate-java-thread-dump-programmatically/ -- This message was sent by Atlassian Jira (v8.20.10#820010)