Hi!
Sorry for not following up on this, turned out some ports were blocked by
some random firewall change. So no issue on Flinks side.

Gyula

On Fri, Mar 16, 2018, 17:41 Aljoscha Krettek <aljos...@apache.org> wrote:

> Hi Gyula,
>
> Is there any news on this?
>
> @Nico or @Gary you recently also did stuff with YARN, do you maybe have an
> idea of what could be going on?
>
> Best,
> Aljoscha
>
> > On 21. Nov 2017, at 06:42, Gyula Fóra <gyula.f...@gmail.com> wrote:
> >
> > Hi all!
> >
> > Today we started noticing that deploying our jobs took over 3 minutes
> when
> > deployed from some machine and normal (few seconds) when deployed from
> the
> > others.
> >
> > Looking at the logs it seems that the client cant find some job id for a
> > few minutes in this case:
> >
> > ...
> > 2017-11-21 15:23:00,880 DEBUG org.apache.flink.yarn.YarnJobManager
> >                - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found
> in
> > JobManager
> > 2017-11-21 15:23:04,528 DEBUG org.apache.zookeeper.ClientCnxn
> >                 - Got ping response for sessionid: 0x25eb8e005b7971b
> after
> > 0ms
> > 2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client
> >                - IPC Client (937277082) connection to
> > splat13.sto.midasplayer.com/172.26.87.155:8030 from splat sending #38
> > 2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client
> >                - IPC Client (937277082) connection to
> > splat13.sto.midasplayer.com/172.26.87.155:8030 from splat got value #38
> > 2017-11-21 15:23:04,651 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
> >                 - Call: allocate took 16ms
> > 2017-11-21 15:23:05,880 DEBUG org.apache.flink.yarn.YarnJobManager
> >                - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found
> in
> > JobManager
> > 2017-11-21 15:23:06,409 DEBUG akka.remote.RemoteWatcher
> >                 - Sending Heartbeat to [akka.tcp://
> > fl...@splat33.sto.midasplayer.com:56045]
> > 2017-11-21 15:23:06,413 DEBUG akka.remote.RemoteWatcher
> >                 - Received heartbeat rsp from [akka.tcp://
> > fl...@splat33.sto.midasplayer.com:56045]
> > 2017-11-21 15:23:07,665 DEBUG
> > akka.serialization.Serialization(akka://flink)                - Using
> > serializer[akka.serialization.JavaSerializer] for message
> >
> [org.apache.flink.runtime.clusterframework.messages.GetClusterStatusResponse]
> > 2017-11-21 15:23:07,824 INFO  org.apache.flink.yarn.YarnJobManager
> >                - Submitting job 179d67bfab7c4c0b9f00ea772f6e4f0c
> > (event-bifrost-log).
> > 2017
> >
> > Interestingly enough nothing like this shows when deployed from other
> > servers.
> > We suspect there might be some strange network issue (which doesnt seem
> to
> > affect jar upload times) that screws with akka in some way.
> >
> > Any idea how to debug this?
> > Thank you!
> >
> > Gyula
>
>

Reply via email to