Hi Gyula, Is there any news on this?
@Nico or @Gary you recently also did stuff with YARN, do you maybe have an idea of what could be going on? Best, Aljoscha > On 21. Nov 2017, at 06:42, Gyula Fóra <gyula.f...@gmail.com> wrote: > > Hi all! > > Today we started noticing that deploying our jobs took over 3 minutes when > deployed from some machine and normal (few seconds) when deployed from the > others. > > Looking at the logs it seems that the client cant find some job id for a > few minutes in this case: > > ... > 2017-11-21 15:23:00,880 DEBUG org.apache.flink.yarn.YarnJobManager > - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found in > JobManager > 2017-11-21 15:23:04,528 DEBUG org.apache.zookeeper.ClientCnxn > - Got ping response for sessionid: 0x25eb8e005b7971b after > 0ms > 2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client > - IPC Client (937277082) connection to > splat13.sto.midasplayer.com/172.26.87.155:8030 from splat sending #38 > 2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client > - IPC Client (937277082) connection to > splat13.sto.midasplayer.com/172.26.87.155:8030 from splat got value #38 > 2017-11-21 15:23:04,651 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine > - Call: allocate took 16ms > 2017-11-21 15:23:05,880 DEBUG org.apache.flink.yarn.YarnJobManager > - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found in > JobManager > 2017-11-21 15:23:06,409 DEBUG akka.remote.RemoteWatcher > - Sending Heartbeat to [akka.tcp:// > fl...@splat33.sto.midasplayer.com:56045] > 2017-11-21 15:23:06,413 DEBUG akka.remote.RemoteWatcher > - Received heartbeat rsp from [akka.tcp:// > fl...@splat33.sto.midasplayer.com:56045] > 2017-11-21 15:23:07,665 DEBUG > akka.serialization.Serialization(akka://flink) - Using > serializer[akka.serialization.JavaSerializer] for message > [org.apache.flink.runtime.clusterframework.messages.GetClusterStatusResponse] > 2017-11-21 15:23:07,824 INFO org.apache.flink.yarn.YarnJobManager > - Submitting job 179d67bfab7c4c0b9f00ea772f6e4f0c > (event-bifrost-log). > 2017 > > Interestingly enough nothing like this shows when deployed from other > servers. > We suspect there might be some strange network issue (which doesnt seem to > affect jar upload times) that screws with akka in some way. > > Any idea how to debug this? > Thank you! > > Gyula