The leader znode is the right one  ( it is a binary )

get
/flink_test/da_15/leader/00000000000000000000000000000000/job_manager_lock

wFDakka.tcp://
[email protected]:22161/user/jobmanagersrjava.util.UUIDm/J


                      leastSigBitsJ


                                  mostSigBitsxpHv


So it does ( I think ) resolve the right leader of the HA, but from there (
the logs do not help as DEBUG logs do not expose what server it hits sadly
) .


On Tue, Jun 26, 2018 at 9:57 AM, Vishal Santoshi <[email protected]>
wrote:

> OK few things
>
> 2018-06-26 13:31:29 INFO  CliFrontend:282 -  Starting Command Line Client
> (Version: 1.5.0, Rev:c61b108, Date:24.05.2018 @ 14:54:44 UTC)
>
> ...
>
> 2018-06-26 13:31:31 INFO  ClientCnxn:876 - Socket connection established
> to zk-f1fb95b9.bf2.tumblr.net/10.246.218.17:2181, initiating session
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:949 - Session establishment request
> sent on zk-f1fb95b9.bf2.tumblr.net/10.246.218.17:2181
>
> 2018-06-26 13:31:31 INFO  ClientCnxn:1299 - Session establishment
> complete on server zk-f1fb95b9.bf2.tumblr.net/10.246.218.17:2181,
> sessionid = 0x35add547801ea07, negotiated timeout = 40000
>
> 2018-06-26 13:31:31 INFO  RestClient:119 - Rest client endpoint started.
>
> 2018-06-26 13:31:31 INFO  ZooKeeperLeaderRetrievalService:100 - Starting
> ZooKeeperLeaderRetrievalService /leader/rest_server_lock.
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 1,3  replyHeader:: 1,60416530560,0  request::
> '/flink_test,F  response:: s{47265479496,47265479496,
> 1489163688703,1489163688703,0,2,0,0,0,2,60416492885}
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 2,3  replyHeader:: 2,60416530560,0  request::
> '/flink_test/da_15,F  response:: s{60416492885,60416492885,
> 1529755199131,1529755199131,0,5,0,0,0,5,60416521584}
>
> 2018-06-26 13:31:31 INFO  ZooKeeperLeaderRetrievalService:100 - Starting
> ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 3,3  replyHeader:: 3,60416530560,0  request::
> '/flink_test,F  response:: s{47265479496,47265479496,
> 1489163688703,1489163688703,0,2,0,0,0,2,60416492885}
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 4,3  replyHeader:: 4,60416530560,0  request::
> '/flink_test/da_15,F  response:: s{60416492885,60416492885,
> 1529755199131,1529755199131,0,5,0,0,0,5,60416521584}
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 5,3  replyHeader:: 5,60416530560,0  request::
> '/flink_test/da_15/leader,F  response:: s{60416492887,60416492887,
> 1529755199191,1529755199191,0,1,0,0,0,1,60416492888}
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: 
> clientPath:/flink_test/da_15/leader/rest_server_lock
> serverPath:/flink_test/da_15/leader/rest_server_lock finished:false
> header:: 6,3  replyHeader:: 6,60416530560,-101  request::
> '/flink_test/da_15/leader/rest_server_lock,T  response::
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 7,3  replyHeader:: 7,60416530560,0  request::
> '/flink_test,F  response:: s{47265479496,47265479496,
> 1489163688703,1489163688703,0,2,0,0,0,2,60416492885}
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 8,3  replyHeader:: 8,60416530560,0  request::
> '/flink_test/da_15,F  response:: s{60416492885,60416492885,
> 1529755199131,1529755199131,0,5,0,0,0,5,60416521584}
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 9,3  replyHeader:: 9,60416530560,0  request::
> '/flink_test/da_15/leader,F  response:: s{60416492887,60416492887,
> 1529755199191,1529755199191,0,1,0,0,0,1,60416492888}
>
> 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: 
> clientPath:/flink_test/da_15/leader/dispatcher_lock
> serverPath:/flink_test/da_15/leader/dispatcher_lock finished:false
> header:: 10,3  replyHeader:: 10,60416530560,-101  request::
> '/flink_test/da_15/leader/dispatcher_lock,T  response::
>
> 2018-06-26 13:31:31 INFO  CliFrontend:914 - Waiting for response...
>
> Waiting for response...
>
> 2018-06-26 13:31:44 DEBUG ClientCnxn:742 - Got ping response for
> sessionid: 0x35add547801ea07 after 0ms
>
> 2018-06-26 13:31:58 DEBUG ClientCnxn:742 - Got ping response for
> sessionid: 0x35add547801ea07 after 0ms
>
> 2018-06-26 13:32:01 INFO  RestClient:123 - Shutting down rest endpoint.
>
> 2018-06-26 13:32:01 INFO  RestClient:140 - Rest endpoint shutdown
> complete.
>
> 2018-06-26 13:32:01 INFO  ZooKeeperLeaderRetrievalService:117 - Stopping
> ZooKeeperLeaderRetrievalService /leader/rest_server_lock.
>
> 2018-06-26 13:32:01 INFO  ZooKeeperLeaderRetrievalService:117 - Stopping
> ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
>
> 2018-06-26 13:32:01 DEBUG CuratorFrameworkImpl:282 - Closing
>
> 2018-06-26 13:32:01 INFO  CuratorFrameworkImpl:821 -
> backgroundOperationsLoop exiting
>
> 2018-06-26 13:32:01 DEBUG CuratorZookeeperClient:199 - Closing
>
> 2018-06-26 13:32:01 DEBUG ConnectionState:115 - Closing
>
> 2018-06-26 13:32:01 DEBUG ZooKeeper:673 - Closing session:
> 0x35add547801ea07
>
> 2018-06-26 13:32:01 DEBUG ClientCnxn:1370 - Closing client for session:
> 0x35add547801ea07
>
> 2018-06-26 13:32:01 DEBUG ClientCnxn:843 - Reading reply
> sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null
> finished:false header:: 11,-11  replyHeader:: 11,60416530561,0  request::
> null response:: null
>
> 2018-06-26 13:32:01 DEBUG ClientCnxn:1354 - Disconnecting client for
> session: 0x35add547801ea07
>
> 2018-06-26 13:32:01 INFO  ZooKeeper:684 - Session: 0x35add547801ea07
> closed
>
> 2018-06-26 13:32:01 INFO  ClientCnxn:519 - EventThread shut down for
> session: 0x35add547801ea07
>
> 2018-06-26 13:32:01 DEBUG ClientCnxn:1146 - An exception was thrown while
> closing send thread for session 0x35add547801ea07 : Unable to read
> additional data from server sessionid 0x35add547801ea07, likely server has
> closed socket
>
> 2018-06-26 13:32:01 ERROR CliFrontend:891 - Error while running the
> command.
>
> org.apache.flink.util.FlinkException: Failed to retrieve job list.
>
> at org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:429)
>
> at org.apache.flink.client.cli.CliFrontend.lambda$list$0(
> CliFrontend.java:412)
>
>
> On Tue, Jun 26, 2018 at 5:43 AM, zhangminglei <[email protected]> wrote:
>
>> By the way, in HA set up.
>>
>> 在 2018年6月26日,下午5:39,zhangminglei <[email protected]> 写道:
>>
>> Hi, Gary Yao
>>
>> Once I discovered that there was a change in the ip address[
>> jobmanager.rpc.address ]. From 10.208.73.129 to localhost. I think that
>> will cause the issue. What do you think ?
>>
>> Cheers
>> Minglei
>>
>> 在 2018年6月26日,下午4:53,Gary Yao <[email protected]> 写道:
>>
>> Hi Vishal,
>>
>> Could it be that you are not using the 1.5.0 client? The stacktrace you
>> posted
>> does not reference valid lines of code in the release-1.5.0-rc6 tag.
>>
>> If you have a HA setup, the host and port of the leading JM will be
>> looked up
>> from ZooKeeper before job submission. Therefore, the flink-conf.yaml used
>> by the
>> client must have the same ZooKeeper configuration as used by the Flink
>> cluster.
>>
>> Best,
>> Gary
>>
>> On Mon, Jun 25, 2018 at 5:32 PM, Vishal Santoshi <
>> [email protected]> wrote:
>>
>>> I think all I need to add is
>>>
>>> web.port: 8081
>>> rest.port: 8081
>>>
>>> to the JM flink conf ?
>>>
>>> On Mon, Jun 25, 2018 at 10:46 AM, Vishal Santoshi <
>>> [email protected]> wrote:
>>>
>>>> Another issue I saw with flink cli...
>>>>
>>>> org.apache.flink.client.program.ProgramInvocationException: The
>>>> program execution failed: JobManager did not respond within 120000 ms
>>>> at org.apache.flink.client.program.ClusterClient.runDetached(Cl
>>>> usterClient.java:524)
>>>> at org.apache.flink.client.program.StandaloneClusterClient.subm
>>>> itJob(StandaloneClusterClient.java:103)
>>>> at org.apache.flink.client.program.ClusterClient.run(ClusterCli
>>>> ent.java:456)
>>>> at org.apache.flink.client.program.DetachedEnvironment.finalize
>>>> Execute(DetachedEnvironment.java:77)
>>>> at org.apach
>>>>
>>>> This was a simple submission  and it does succeed through the UI.
>>>>
>>>> Has there been a regression on CLI... I could not find any
>>>> documentation around it.
>>>>
>>>> I have a HA JM setup.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 25, 2018 at 10:22 AM, Chesnay Schepler <[email protected]>
>>>> wrote:
>>>>
>>>>> The watermark issue is know and will be fixed in 1.5.1
>>>>>
>>>>>
>>>>> On 25.06.2018 15:03, Vishal Santoshi wrote:
>>>>>
>>>>> Thank you....
>>>>>
>>>>> One addition
>>>>>
>>>>> I do not see WM info on the UI  ( Attached )
>>>>>
>>>>> Is this a know issue. The same pipe on our production has the WM ( In
>>>>> fact never had an issue with  Watermarks not appearing ) . Am I missing
>>>>> something ?
>>>>>
>>>>> On Mon, Jun 25, 2018 at 4:15 AM, Fabian Hueske <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Vishal,
>>>>>>
>>>>>> 1. I don't think a rolling update is possible. Flink 1.5.0 changed
>>>>>> the process orchestration and how they communicate. IMO, the way to go is
>>>>>> to start a Flink 1.5.0 cluster, take a savepoint on the running job, 
>>>>>> start
>>>>>> from the savepoint on the new cluster and shut the old job down.
>>>>>> 2. Savepoints should be compatible.
>>>>>> 3. You can keep the slot configuration as before.
>>>>>> 4. As I said before, mixing 1.5 and 1.4 processes does not work (or
>>>>>> at least, it was not considered a design goal and nobody paid attention
>>>>>> that it is possible).
>>>>>>
>>>>>> Best, Fabian
>>>>>>
>>>>>>
>>>>>> 2018-06-23 13:38 GMT+02:00 Vishal Santoshi <[email protected]
>>>>>> >:
>>>>>>
>>>>>>>
>>>>>>> 1.
>>>>>>> Can or has any one  done  a rolling upgrade from 1.4 to 1.5 ?  I am
>>>>>>> not sure we can. It seems that JM cannot recover jobs with this 
>>>>>>> exception
>>>>>>>
>>>>>>> Caused by: java.io.InvalidClassException:
>>>>>>> org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration;
>>>>>>> local class incompatible: stream classdesc serialVersionUID =
>>>>>>> -647384516034982626, local class serialVersionUID = 2
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2.
>>>>>>> Does SP on 1.4, resume on 1.5 ( pretty basic but no harm asking ) ?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 3.
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>> release-notes/flink-1.5.html#update-configuration-for-rework
>>>>>>> ed-job-deployment The taskmanager.numberOfTaskSlots: What would be
>>>>>>> the desired setting in a stand alone ( non mesos/yarn ) cluster ?
>>>>>>>
>>>>>>>
>>>>>>> 4. I suspend all jobs and establish 1.5 on the JM ( the TMs are
>>>>>>> still running with 1.4 ) . JM refuse to start  with
>>>>>>>
>>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]:
>>>>>>> 2018-06-23 11:34:23 ERROR JobManager:116 - Failed to recover job
>>>>>>> 454cd84a519f3b50e88bcb378d8a1330.
>>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]:
>>>>>>> java.lang.InstantiationError: org.apache.flink.runtime.blob.BlobKey
>>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at
>>>>>>> sun.reflect.GeneratedSerializationConstructorAccessor51.newInstance(Unknown
>>>>>>> Source)
>>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at
>>>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at
>>>>>>> java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1079)
>>>>>>> Jun
>>>>>>> .....
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Any feedback would be highly appreciated...
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>

Reply via email to