On Thu, Mar 11, 2021 at 02:14:32PM +0100, Arvid Heise wrote: > Hi ChangZhuo, > > Did you upgrade to Flink 1.12.2 and change the settings at the time? If so, > could you maybe reset the settings to the old values on Flink 1.12.2 and > check if the job still gets stuck? Especially, turning off unaligned > checkpoints (UC) should clarify if it's a general issue in Flink 1.12.2 or > with UC. > > If it's indeed an issue with UC, then it would help to get the debug logs > in particular for the package > org.apache.flink.streaming.runtime.io.checkpointing. You could add the > following to your log4js.properties (set general log level to INFO). > > logger.checkpointing.name = > org.apache.flink.streaming.runtime.io.checkpointing > logger.checkpointing.level = DEBUG
* Thanks for this information, we are working on this one, will reply when we get log. * Also, we got the stack track when checkpoint stuck, please let us know if you need full trace. * The stuck task in UI is KafkaProducer -> ProcessFunction 128 * The following is BLOCKED thread for Source: KafkaProducer -> ProcessFunction (129/140)#2 "Source: KafkaProducer -> ProcessFunction (129/140)#2" #66336 prio=5 os_prio=0 cpu=582.01ms elapsed=5079.15s tid=0x00007feb32717000 nid=0x9696 waiting for monitor entry [0x00007feb28b61000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92) - waiting to lock <0x000000058e8c5070> (a java.lang.Object) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:189) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) at java.lang.Thread.run(java.base@11.0.8/Thread.java:834) ps: * The original UID is redacted by their underlying type. * It looks like subtask id in UI is off-by-one in stacktrace. -- ChangZhuo Chen (陳昌倬) czchen@{czchen,debconf,debian}.org http://czchen.info/ Key fingerprint = BA04 346D C2E1 FE63 C790 8793 CC65 B0CD EC27 5D5B
signature.asc
Description: PGP signature