Hi,
We are running a Flink cluster with 2 JMs in HA and 2 TMs on a standalone K8
cluster. After migrating to 1.14.3, we started to see some exceptions in the JM
logs:
2022-02-15 11:30:00,100 ERROR
org.apache.flink.runtime.rest.handler.job.JobIdsHandler [] POD_NAME:
eric-bss-em-sm-streamserver-jobmanager-868fd68b5d-zs9pv - Unhandled
exception.org.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Failed
to serialize the result for RPC call : requestMultipleJobDetails. at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:417)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$sendAsyncResponse$2(AkkaRpcActor.java:373)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
~[?:1.8.0_321] at
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:848)
~[?:1.8.0_321] at
java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2168)
~[?:1.8.0_321] at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:365)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:332)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.actor.Actor.aroundReceive(Actor.scala:537)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.actor.Actor.aroundReceive$(Actor.scala:535)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.actor.ActorCell.invoke(ActorCell.scala:548)
[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.dispatch.Mailbox.run(Mailbox.scala:231)
[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
akka.dispatch.Mailbox.exec(Mailbox.scala:243)
[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_321]
at
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067)
[?:1.8.0_321] at
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703)
[?:1.8.0_321] at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172)
[?:1.8.0_321]Caused by: java.io.NotSerializableException:
java.util.HashMap$Values at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
~[?:1.8.0_321] at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
~[?:1.8.0_321] at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
~[?:1.8.0_321] at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
~[?:1.8.0_321] at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
~[?:1.8.0_321] at
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
~[?:1.8.0_321] at
org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:632)
~[flink-dist_2.11-1.14.3.jar:1.14.3] at
org.apache.flink.runtime.rpc.akka.AkkaRpcSerializedValue.valueOf(AkkaRpcSerializedValue.java:66)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:400)
~[flink-rpc-akka_e7ab3036-42d0-412d-9a01-e55694061b27.jar:1.14.3] ...
29 more
Not sure whether this requestMultipleJobDetails API is being consumed by the
Flink GUI or my internal job sync task (we consume this API periodically for
dashboarding).
Any help is much appreciated. Thank you.
Chirag