Hi Vishal, The znode /flink_test/da_15/leader/rest_server_lock should exist as long as your Flink 1.5 cluster is running. In 1.4 this znode will not be created. Are you sure that the znode does not exist? Unfortunately you only attached the output of "ls /flink_test/da_15".
Can you share the complete JobManager log files from a cluster that is (re-)starting? Best, Gary On Thu, Jun 28, 2018 at 4:10 PM, Vishal Santoshi <[email protected]> wrote: > I am not seeing rest_server_lock. Is it transient ( ephemeral znode ) > for the duration of the cli command ? > > > [zk: localhost:2181(CONNECTED) 2] ls /flink_test/da_15 > > [jobgraphs, leader, checkpoints, leaderlatch, checkpoint-counter] > > > The logs say > > 2018-06-28 14:02:56 INFO ZooKeeperLeaderRetrievalService:100 - Starting > ZooKeeperLeaderRetrievalService /leader/rest_server_lock. > > 2018-06-28 14:02:56 INFO ZooKeeperLeaderRetrievalService:100 - Starting > ZooKeeperLeaderRetrievalService /leader/dispatcher_lock. > > Is this a relative path, > > high-availability.zookeeper.path.root: /flink_test > > high-availability.cluster-id: /da_15 > > > I do not see /leader/rest_server_lock both during the duration of the > cli run ( or before or after ). > > I am a little stumped.... I do not see the above logs on 1.4 so am not > sure whether /leader/rest_server_lock is the new code... > > > On Thu, Jun 28, 2018 at 3:30 AM, Christophe Jolif <[email protected]> > wrote: > >> Chesnay, >> >> Do you have rough idea of the 1.5.1 timeline? >> >> Thanks, >> -- >> Christophe >> >> On Mon, Jun 25, 2018 at 4:22 PM, Chesnay Schepler <[email protected]> >> wrote: >> >>> The watermark issue is know and will be fixed in 1.5.1 >>> >>> >>> On 25.06.2018 15:03, Vishal Santoshi wrote: >>> >>> Thank you.... >>> >>> One addition >>> >>> I do not see WM info on the UI ( Attached ) >>> >>> Is this a know issue. The same pipe on our production has the WM ( In >>> fact never had an issue with Watermarks not appearing ) . Am I missing >>> something ? >>> >>> On Mon, Jun 25, 2018 at 4:15 AM, Fabian Hueske <[email protected]> >>> wrote: >>> >>>> Hi Vishal, >>>> >>>> 1. I don't think a rolling update is possible. Flink 1.5.0 changed the >>>> process orchestration and how they communicate. IMO, the way to go is to >>>> start a Flink 1.5.0 cluster, take a savepoint on the running job, start >>>> from the savepoint on the new cluster and shut the old job down. >>>> 2. Savepoints should be compatible. >>>> 3. You can keep the slot configuration as before. >>>> 4. As I said before, mixing 1.5 and 1.4 processes does not work (or at >>>> least, it was not considered a design goal and nobody paid attention that >>>> it is possible). >>>> >>>> Best, Fabian >>>> >>>> >>>> 2018-06-23 13:38 GMT+02:00 Vishal Santoshi <[email protected]>: >>>> >>>>> >>>>> 1. >>>>> Can or has any one done a rolling upgrade from 1.4 to 1.5 ? I am >>>>> not sure we can. It seems that JM cannot recover jobs with this exception >>>>> >>>>> Caused by: java.io.InvalidClassException: >>>>> org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration; >>>>> local class incompatible: stream classdesc serialVersionUID = >>>>> -647384516034982626, local class serialVersionUID = 2 >>>>> >>>>> >>>>> >>>>> >>>>> 2. >>>>> Does SP on 1.4, resume on 1.5 ( pretty basic but no harm asking ) ? >>>>> >>>>> >>>>> >>>>> 3. >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ >>>>> release-notes/flink-1.5.html#update-configuration-for-rework >>>>> ed-job-deployment The taskmanager.numberOfTaskSlots: What would be >>>>> the desired setting in a stand alone ( non mesos/yarn ) cluster ? >>>>> >>>>> >>>>> 4. I suspend all jobs and establish 1.5 on the JM ( the TMs are still >>>>> running with 1.4 ) . JM refuse to start with >>>>> >>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: >>>>> 2018-06-23 11:34:23 ERROR JobManager:116 - Failed to recover job >>>>> 454cd84a519f3b50e88bcb378d8a1330. >>>>> >>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: >>>>> java.lang.InstantiationError: org.apache.flink.runtime.blob.BlobKey >>>>> >>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>> sun.reflect.GeneratedSerializationConstructorAccessor51.newInstance(Unknown >>>>> Source) >>>>> >>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:423) >>>>> >>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>> java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1079) >>>>> >>>>> Jun >>>>> ..... >>>>> >>>>> >>>>> >>>>> Any feedback would be highly appreciated... >>>>> >>>>> >>>> >>> >>> >> >> >> -- >> Christophe >> > >
