[ 
https://issues.apache.org/jira/browse/FLINK-37895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956132#comment-17956132
 ] 

Gyula Fora commented on FLINK-37895:
------------------------------------

[~sverma] could you please take a look?

> "Failed to fetch job exceptions from REST API for jobId" errors for session 
> jobs
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-37895
>                 URL: https://issues.apache.org/jira/browse/FLINK-37895
>             Project: Flink
>          Issue Type: Bug
>         Environment: K8s 1.32 on arm64 nodes
> Flink kubernetes operator 1.12.0
> Flink 1.19.1
>            Reporter: Sebastian Struß
>            Priority: Major
>
> Flink kubernetes operator in version 1.12.0 has started to print out error 
> messages like this:
> ```
> {"timeMillis":1749046856604,"thread":"ReconcilerExecutor-flinksessionjobcontroller-84","level":"WARN","loggerName":"org.apache.flink.kubernetes.operator.service.AbstractFlinkService","message":"Failed
>  to fetch job exceptions from REST API for jobId 
> 56bdbb2095a14bb40d154cf0a3ba4659","thrown":\{"commonElementCount":0,"localizedMessage":"java.net.UnknownHostException:
>  parquetizer-xyz-rest.parquetizers: Name or service not 
> known","message":"java.net.UnknownHostException: 
> parquetizer-xyz-rest.parquetizers: Name or service not 
> known","name":"java.util.concurrent.ExecutionException","cause":{"commonElementCount":1,"localizedMessage":"parquetizer-xyz-rest.parquetizers:
>  Name or service not known","message":"parquetizer-xyz-rest.parquetizers: 
> Name or service not 
> known","name":"java.net.UnknownHostException","extendedStackTrace":"java.net.UnknownHostException:
>  parquetizer-xyz-rest.parquetizers: Name or service not known\n\tat 
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:?]\n\tat 
> java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) 
> ~[?:?]\n\tat java.net.InetAddress.getAddressesFromNameService(Unknown Source) 
> ~[?:?]\n\tat java.net.InetAddress$NameServiceAddresses.get(Unknown Source) 
> ~[?:?]\n\tat java.net.InetAddress.getAllByName0(Unknown Source) ~[?:?]\n\tat 
> java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat 
> java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat 
> java.net.InetAddress.getByName(Unknown Source) ~[?:?]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> java.security.AccessController.doPrivileged(Native Method) ~[?:?]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  
> ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n"},"extendedStackTrace":"java.util.concurrent.ExecutionException:
>  java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or 
> service not known\n\tat 
> java.util.concurrent.CompletableFuture.reportGet(Unknown Source) ~[?:?]\n\tat 
> java.util.concurrent.CompletableFuture.get(Unknown Source) ~[?:?]\n\tat 
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getJobExceptions(AbstractFlinkService.java:873)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.kubernetes.operator.observer.JobStatusObserver.observeJobManagerExceptions(JobStatusObserver.java:131)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.kubernetes.operator.observer.JobStatusObserver.observe(JobStatusObserver.java:97)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.kubernetes.operator.observer.sessionjob.FlinkSessionJobObserver.observeInternal(FlinkSessionJobObserver.java:54)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:49)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:113)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:59)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:153)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:111)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:110)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:136)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:117)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452)
>  [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]\n\tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> [?:?]\n\tat java.lang.Thread.run(Unknown Source) [?:?]\nCaused by: 
> java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or 
> service not known\n\tat java.net.Inet6AddressImpl.lookupAllHostAddr(Native 
> Method) ~[?:?]\n\tat 
> java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) 
> ~[?:?]\n\tat java.net.InetAddress.getAddressesFromNameService(Unknown Source) 
> ~[?:?]\n\tat java.net.InetAddress$NameServiceAddresses.get(Unknown Source) 
> ~[?:?]\n\tat java.net.InetAddress.getAllByName0(Unknown Source) ~[?:?]\n\tat 
> java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat 
> java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat 
> java.net.InetAddress.getByName(Unknown Source) ~[?:?]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> java.security.AccessController.doPrivileged(Native Method) ~[?:?]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\t... 1 
> more\n"},"endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":\{"resource.apiVersion":"flink.apache.org/v1beta1","resource.generation":"2","resource.kind":"FlinkSessionJob","resource.name":"parquetizer-xyz","resource.namespace":"parquetizers","resource.resourceVersion":"1904237678","resource.uid":"736550c3-dc52-4a0b-8124-f873d02f5d53"},"threadId":84,"threadPriority":5},
> ```
>  
> We didn't see those with flink-kubernetes-operator 1.11.0.
>  
> It seems that the operator tries to reach a service based on the jobs name 
> inside the cluster.
> Since I am using a session cluster here, it should be reaching out to it and 
> query for exception logs - am I wrong?
> The service however doesn't exist (and never did), hence the error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to