OK few things 2018-06-26 13:31:29 INFO CliFrontend:282 - Starting Command Line Client (Version: 1.5.0, Rev:c61b108, Date:24.05.2018 @ 14:54:44 UTC)
... 2018-06-26 13:31:31 INFO ClientCnxn:876 - Socket connection established to zk-f1fb95b9.bf2.tumblr.net/10.246.218.17:2181, initiating session 2018-06-26 13:31:31 DEBUG ClientCnxn:949 - Session establishment request sent on zk-f1fb95b9.bf2.tumblr.net/10.246.218.17:2181 2018-06-26 13:31:31 INFO ClientCnxn:1299 - Session establishment complete on server zk-f1fb95b9.bf2.tumblr.net/10.246.218.17:2181, sessionid = 0x35add547801ea07, negotiated timeout = 40000 2018-06-26 13:31:31 INFO RestClient:119 - Rest client endpoint started. 2018-06-26 13:31:31 INFO ZooKeeperLeaderRetrievalService:100 - Starting ZooKeeperLeaderRetrievalService /leader/rest_server_lock. 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 1,3 replyHeader:: 1,60416530560,0 request:: '/flink_test,F response:: s{47265479496,47265479496,1489163688703,1489163688703,0,2,0,0,0,2,60416492885} 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 2,3 replyHeader:: 2,60416530560,0 request:: '/flink_test/da_15,F response:: s{60416492885,60416492885,1529755199131,1529755199131,0,5,0,0,0,5,60416521584} 2018-06-26 13:31:31 INFO ZooKeeperLeaderRetrievalService:100 - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock. 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 3,3 replyHeader:: 3,60416530560,0 request:: '/flink_test,F response:: s{47265479496,47265479496,1489163688703,1489163688703,0,2,0,0,0,2,60416492885} 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 4,3 replyHeader:: 4,60416530560,0 request:: '/flink_test/da_15,F response:: s{60416492885,60416492885,1529755199131,1529755199131,0,5,0,0,0,5,60416521584} 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 5,3 replyHeader:: 5,60416530560,0 request:: '/flink_test/da_15/leader,F response:: s{60416492887,60416492887,1529755199191,1529755199191,0,1,0,0,0,1,60416492888} 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:/flink_test/da_15/leader/rest_server_lock serverPath:/flink_test/da_15/leader/rest_server_lock finished:false header:: 6,3 replyHeader:: 6,60416530560,-101 request:: '/flink_test/da_15/leader/rest_server_lock,T response:: 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 7,3 replyHeader:: 7,60416530560,0 request:: '/flink_test,F response:: s{47265479496,47265479496,1489163688703,1489163688703,0,2,0,0,0,2,60416492885} 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 8,3 replyHeader:: 8,60416530560,0 request:: '/flink_test/da_15,F response:: s{60416492885,60416492885,1529755199131,1529755199131,0,5,0,0,0,5,60416521584} 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 9,3 replyHeader:: 9,60416530560,0 request:: '/flink_test/da_15/leader,F response:: s{60416492887,60416492887,1529755199191,1529755199191,0,1,0,0,0,1,60416492888} 2018-06-26 13:31:31 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:/flink_test/da_15/leader/dispatcher_lock serverPath:/flink_test/da_15/leader/dispatcher_lock finished:false header:: 10,3 replyHeader:: 10,60416530560,-101 request:: '/flink_test/da_15/leader/dispatcher_lock,T response:: 2018-06-26 13:31:31 INFO CliFrontend:914 - Waiting for response... Waiting for response... 2018-06-26 13:31:44 DEBUG ClientCnxn:742 - Got ping response for sessionid: 0x35add547801ea07 after 0ms 2018-06-26 13:31:58 DEBUG ClientCnxn:742 - Got ping response for sessionid: 0x35add547801ea07 after 0ms 2018-06-26 13:32:01 INFO RestClient:123 - Shutting down rest endpoint. 2018-06-26 13:32:01 INFO RestClient:140 - Rest endpoint shutdown complete. 2018-06-26 13:32:01 INFO ZooKeeperLeaderRetrievalService:117 - Stopping ZooKeeperLeaderRetrievalService /leader/rest_server_lock. 2018-06-26 13:32:01 INFO ZooKeeperLeaderRetrievalService:117 - Stopping ZooKeeperLeaderRetrievalService /leader/dispatcher_lock. 2018-06-26 13:32:01 DEBUG CuratorFrameworkImpl:282 - Closing 2018-06-26 13:32:01 INFO CuratorFrameworkImpl:821 - backgroundOperationsLoop exiting 2018-06-26 13:32:01 DEBUG CuratorZookeeperClient:199 - Closing 2018-06-26 13:32:01 DEBUG ConnectionState:115 - Closing 2018-06-26 13:32:01 DEBUG ZooKeeper:673 - Closing session: 0x35add547801ea07 2018-06-26 13:32:01 DEBUG ClientCnxn:1370 - Closing client for session: 0x35add547801ea07 2018-06-26 13:32:01 DEBUG ClientCnxn:843 - Reading reply sessionid:0x35add547801ea07, packet:: clientPath:null serverPath:null finished:false header:: 11,-11 replyHeader:: 11,60416530561,0 request:: null response:: null 2018-06-26 13:32:01 DEBUG ClientCnxn:1354 - Disconnecting client for session: 0x35add547801ea07 2018-06-26 13:32:01 INFO ZooKeeper:684 - Session: 0x35add547801ea07 closed 2018-06-26 13:32:01 INFO ClientCnxn:519 - EventThread shut down for session: 0x35add547801ea07 2018-06-26 13:32:01 DEBUG ClientCnxn:1146 - An exception was thrown while closing send thread for session 0x35add547801ea07 : Unable to read additional data from server sessionid 0x35add547801ea07, likely server has closed socket 2018-06-26 13:32:01 ERROR CliFrontend:891 - Error while running the command. org.apache.flink.util.FlinkException: Failed to retrieve job list. at org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:429) at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:412) On Tue, Jun 26, 2018 at 5:43 AM, zhangminglei <18717838...@163.com> wrote: > By the way, in HA set up. > > 在 2018年6月26日,下午5:39,zhangminglei <18717838...@163.com> 写道: > > Hi, Gary Yao > > Once I discovered that there was a change in the ip address[ > jobmanager.rpc.address ]. From 10.208.73.129 to localhost. I think that > will cause the issue. What do you think ? > > Cheers > Minglei > > 在 2018年6月26日,下午4:53,Gary Yao <g...@data-artisans.com> 写道: > > Hi Vishal, > > Could it be that you are not using the 1.5.0 client? The stacktrace you > posted > does not reference valid lines of code in the release-1.5.0-rc6 tag. > > If you have a HA setup, the host and port of the leading JM will be looked > up > from ZooKeeper before job submission. Therefore, the flink-conf.yaml used > by the > client must have the same ZooKeeper configuration as used by the Flink > cluster. > > Best, > Gary > > On Mon, Jun 25, 2018 at 5:32 PM, Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> I think all I need to add is >> >> web.port: 8081 >> rest.port: 8081 >> >> to the JM flink conf ? >> >> On Mon, Jun 25, 2018 at 10:46 AM, Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> Another issue I saw with flink cli... >>> >>> org.apache.flink.client.program.ProgramInvocationException: The program >>> execution failed: JobManager did not respond within 120000 ms >>> at org.apache.flink.client.program.ClusterClient.runDetached(Cl >>> usterClient.java:524) >>> at org.apache.flink.client.program.StandaloneClusterClient.subm >>> itJob(StandaloneClusterClient.java:103) >>> at org.apache.flink.client.program.ClusterClient.run(ClusterCli >>> ent.java:456) >>> at org.apache.flink.client.program.DetachedEnvironment.finalize >>> Execute(DetachedEnvironment.java:77) >>> at org.apach >>> >>> This was a simple submission and it does succeed through the UI. >>> >>> Has there been a regression on CLI... I could not find any documentation >>> around it. >>> >>> I have a HA JM setup. >>> >>> >>> >>> >>> On Mon, Jun 25, 2018 at 10:22 AM, Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> The watermark issue is know and will be fixed in 1.5.1 >>>> >>>> >>>> On 25.06.2018 15:03, Vishal Santoshi wrote: >>>> >>>> Thank you.... >>>> >>>> One addition >>>> >>>> I do not see WM info on the UI ( Attached ) >>>> >>>> Is this a know issue. The same pipe on our production has the WM ( In >>>> fact never had an issue with Watermarks not appearing ) . Am I missing >>>> something ? >>>> >>>> On Mon, Jun 25, 2018 at 4:15 AM, Fabian Hueske <fhue...@gmail.com> >>>> wrote: >>>> >>>>> Hi Vishal, >>>>> >>>>> 1. I don't think a rolling update is possible. Flink 1.5.0 changed the >>>>> process orchestration and how they communicate. IMO, the way to go is to >>>>> start a Flink 1.5.0 cluster, take a savepoint on the running job, start >>>>> from the savepoint on the new cluster and shut the old job down. >>>>> 2. Savepoints should be compatible. >>>>> 3. You can keep the slot configuration as before. >>>>> 4. As I said before, mixing 1.5 and 1.4 processes does not work (or at >>>>> least, it was not considered a design goal and nobody paid attention that >>>>> it is possible). >>>>> >>>>> Best, Fabian >>>>> >>>>> >>>>> 2018-06-23 13:38 GMT+02:00 Vishal Santoshi <vishal.santo...@gmail.com> >>>>> : >>>>> >>>>>> >>>>>> 1. >>>>>> Can or has any one done a rolling upgrade from 1.4 to 1.5 ? I am >>>>>> not sure we can. It seems that JM cannot recover jobs with this exception >>>>>> >>>>>> Caused by: java.io.InvalidClassException: >>>>>> org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration; >>>>>> local class incompatible: stream classdesc serialVersionUID = >>>>>> -647384516034982626, local class serialVersionUID = 2 >>>>>> >>>>>> >>>>>> >>>>>> 2. >>>>>> Does SP on 1.4, resume on 1.5 ( pretty basic but no harm asking ) ? >>>>>> >>>>>> >>>>>> >>>>>> 3. >>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ >>>>>> release-notes/flink-1.5.html#update-configuration-for-rework >>>>>> ed-job-deployment The taskmanager.numberOfTaskSlots: What would be >>>>>> the desired setting in a stand alone ( non mesos/yarn ) cluster ? >>>>>> >>>>>> >>>>>> 4. I suspend all jobs and establish 1.5 on the JM ( the TMs are still >>>>>> running with 1.4 ) . JM refuse to start with >>>>>> >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: >>>>>> 2018-06-23 11:34:23 ERROR JobManager:116 - Failed to recover job >>>>>> 454cd84a519f3b50e88bcb378d8a1330. >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: >>>>>> java.lang.InstantiationError: org.apache.flink.runtime.blob.BlobKey >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>>> sun.reflect.GeneratedSerializationConstructorAccessor51.newInstance(Unknown >>>>>> Source) >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:423) >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>>> java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1079) >>>>>> Jun >>>>>> ..... >>>>>> >>>>>> >>>>>> >>>>>> Any feedback would be highly appreciated... >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> > > >