Re: how to match external checkpoints with jobs during recovery

2018-02-06 Thread Aljoscha Krettek
where the data for all jobs is written to. Best, Aljoscha > On 5. Feb 2018, at 20:27, xiatao123 wrote: > > The external checkpoints are in the format of > checkpoint_metadata-0057 > which I have no idea which job this checkpoint metadata belongs to if I have > multi

how to match external checkpoints with jobs during recovery

2018-02-05 Thread xiatao123
The external checkpoints are in the format of checkpoint_metadata-0057 which I have no idea which job this checkpoint metadata belongs to if I have multiple jobs running at the same time. If a job failed unexpected, I need to know which checkpoints belongs to the failed job. Is there API

Re: external checkpoints

2017-11-27 Thread Aljoscha Krettek
r=s3://aljoscha/data-generator-external-checkpoints \ ../flink-state-machine-kafka011-1.0-SNAPSHOT.jar --parallelism 2 --topic events6 --bootstrap.servers some-other-ip:9092 --numKeys 1000 --sleep 1 --checkpointDir s3://aljoscha/data-generator-11-checkpoints --externalizedCheckpoints true This is

Re: external checkpoints

2017-11-24 Thread Fabian Hueske
Hi Aviad, sorry for the late reply. You can configure the checkpoint directory (which is also used for externalized checkpoints) when you create the state backend: env.setStateBackend(new RocksDBStateBackend("hdfs:///checkpoints-data/"); This configures the checkpoint directory to be hdfs:///che

Re: external checkpoints

2017-11-16 Thread aviad
Hi, thanks for the answer. I can use the first option (REST API). for some reason it is undocumented in flink documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/rest_api.html) regarding the second option, configure each job with an externalized checkpoint direct

Re: external checkpoints

2017-11-15 Thread Jins George
Hi Aviad, I had a similar situation and my solution was to use the flink monitoring rest api (/jobs/{jobid}/checkpoints) to get the mapping between job and checkpoint file. Wrap this in a script and run periodically( in my case, it was 30 sec). You can also configure each job with an external

external checkpoints

2017-11-15 Thread Aviad Rotem
Hi, I have several jobs which configured for external check-pointing (enableExternalizedCheckpoints) how can I correlate between checkpoint and jobs. for example, if i want to write script which monitor if the job is up or not and if the job is down it will resume the job from the externalized chec

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-09-04 Thread Stefan Richter
le, can you attach a >> profiler/sampling to your job manager and figure out the hotspot methods >> where most time is spend? This would be very helpful as a starting point >> where the problem is potentially caused. >> >> Best, >> Stefan >> >>>

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-07-09 Thread Jared Stehler
is spend? This would be very helpful as a starting point > where the problem is potentially caused. > > Best, > Stefan > >> Am 29.06.2017 um 18:02 schrieb Jared Stehler >> > <mailto:jared.steh...@intellifylearning.com>>: >> >> We’re seeing

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-07-03 Thread Ufuk Celebi
On Mon, Jul 3, 2017 at 12:02 PM, Stefan Richter wrote: > Another thing that could be really helpful, if possible, can you attach a > profiler/sampling to your job manager and figure out the hotspot methods > where most time is spend? This would be very helpful as a starting point > where the probl

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-07-03 Thread Stefan Richter
/sampling to your job manager and figure out the hotspot methods where most time is spend? This would be very helpful as a starting point where the problem is potentially caused. Best, Stefan > Am 29.06.2017 um 18:02 schrieb Jared Stehler > : > > We’re seeing our external checkpoints di

External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-06-29 Thread Jared Stehler
We’re seeing our external checkpoints directory grow in an unbounded fashion… after upgrading to Flink 1.3. We are using Flink-Mesos. In 1.2 (HA standalone mode), we saw (correctly) that only the latest external checkpoint was being retained (i.e., respecting state.checkpoints.num-retained