On Thu, Mar 11, 2021 at 02:14:32PM +0100, Arvid Heise wrote:
> Hi ChangZhuo,
> 
> Did you upgrade to Flink 1.12.2 and change the settings at the time? If so,
> could you maybe reset the settings to the old values on Flink 1.12.2 and
> check if the job still gets stuck? Especially, turning off unaligned
> checkpoints (UC) should clarify if it's a general issue in Flink 1.12.2 or
> with UC.
> 
> If it's indeed an issue with UC, then it would help to get the debug logs
> in particular for the package
> org.apache.flink.streaming.runtime.io.checkpointing. You could add the
> following to your log4js.properties (set general log level to INFO).
> 
> logger.checkpointing.name = 
> org.apache.flink.streaming.runtime.io.checkpointing
> logger.checkpointing.level = DEBUG

* Thanks for this information, we are working on this one, will reply
  when we get log.

* Also, we got the stack track when checkpoint stuck, please let us know
  if you need full trace.

  * The stuck task in UI is KafkaProducer -> ProcessFunction 128
  * The following is BLOCKED thread for Source: KafkaProducer -> 
ProcessFunction (129/140)#2

    "Source: KafkaProducer -> ProcessFunction (129/140)#2" #66336 prio=5 
os_prio=0 cpu=582.01ms elapsed=5079.15s tid=0x00007feb32717000 nid=0x9696 
waiting for monitor entry  [0x00007feb28b61000]
       java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92)
        - waiting to lock <0x000000058e8c5070> (a java.lang.Object)
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:189)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
        at java.lang.Thread.run(java.base@11.0.8/Thread.java:834)

ps:
* The original UID is redacted by their underlying type.
* It looks like subtask id in UI is off-by-one in stacktrace.


-- 
ChangZhuo Chen (陳昌倬) czchen@{czchen,debconf,debian}.org
http://czchen.info/
Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B

Attachment: signature.asc
Description: PGP signature

Reply via email to