Hi Vino, Yes, Job runs successfully, however, no checkpoints are successful. I will update the source
Regards, Vinay Patil On Fri, Jul 27, 2018 at 2:00 PM vino yang <yanghua1...@gmail.com> wrote: > Hi Vinay, > > Oh! You use a collection source? That's the problem. Please use a general > source like Kafka or others. Maybe your checkpoint has not be triggered, > your job has stopped. > > Thanks, vino. > > 2018-07-27 16:07 GMT+08:00 Vinay Patil <vinay18.pa...@gmail.com>: > >> Hi Vino, >> >> Yes I am enabling checkpoint in the code as follows : >> >> StreamExecutionEnvironment env = >> StreamExecutionEnvironment.createRemoteEnvironment("<job_manager_host>,<job_manager_port>,getJobConfiguration(),jarPath"); >> >> >> env.enableCheckpointing(1000); >> >> env.setSateBackend(new >> FsStateBackend("file:///<shared_mount_point_location>")); >> >> env.getCheckpointConfig().setMinPauseBetweenCheckpoints(1000); >> >> >> In getJobConfiguration method I have set HA related properties like >> HA_STORAGE_PATH,HA_ZOOKEEPER_QUORUM,HA_ZOOKEEPER_ROOT,HA_MODE,HA_JOB_MANAGER_PORT_RANGE,HA_CLUSTER_ID >> >> >> I can see the error in Job Manager logs where it says Collection Source >> is not being executed at the moment. Aborting checkpoint. In the pipeline I >> have a stream initialized using "fromCollection". I think I will have to >> get rid of this. >> >> What do you suggest >> >> Regards, >> Vinay Patil >> >> >> On Thu, Jul 26, 2018 at 12:04 PM vino yang <yanghua1...@gmail.com> wrote: >> >>> Hi Vinay: >>> >>> Did you call specific config API refer to this documentation[1]; >>> >>> Can you share your job program and JM Log? Or the JM log contains the >>> log message like this pattern "Triggering checkpoint {} @ {} for job {}."? >>> >>> [1]: >>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing >>> >>> Thanks, vino. >>> >>> 2018-07-25 19:43 GMT+08:00 Chesnay Schepler <ches...@apache.org>: >>> >>>> Can you provide us with the job code? >>>> >>>> I assume that checkpointing runs properly if you submit the same job to >>>> a normal cluster? >>>> >>>> >>>> On 25.07.2018 13:15, Vinay Patil wrote: >>>> >>>> No error in the logs. That is why I am not able to understand why >>>> checkpoints are not getting triggered. >>>> >>>> Regards, >>>> Vinay Patil >>>> >>>> >>>> On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <vinay18.pa...@gmail.com> >>>> wrote: >>>> >>>>> Hi Chesnay, >>>>> >>>>> No error in the logs. That is why I am not able to understand why >>>>> checkpoints are getting triggered. >>>>> >>>>> Regards, >>>>> Vinay Patil >>>>> >>>>> >>>>> On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <ches...@apache.org> >>>>> wrote: >>>>> >>>>>> Please check the job- and taskmanager logs for anything suspicious. >>>>>> >>>>>> On 25.07.2018 12:33, Vinay Patil wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am starting the cluster using bootstrap application where in I am >>>>>> calling Job Manager and Task Manager main class to form the cluster. The >>>>>> HA >>>>>> cluster is formed correctly and I am able to submit jobs to this cluster >>>>>> using RemoteExecutionEnvironment but when I enable checkpointing in code >>>>>> I >>>>>> do not see any checkpoints triggered on Flink UI. >>>>>> >>>>>> Am I missing any configurations to be set for the >>>>>> RemoteExecutionEnvironment for checkpointing to work. >>>>>> >>>>>> >>>>>> Regards, >>>>>> Vinay Patil >>>>>> >>>>>> >>>>>> >>>> >>> >