Re: Checkpointing while loading causing issues

2024-05-14 Thread gongzhongqiang
Hi Lars, Currently, there is no configuration available to trigger a checkpoint immediately after the job starts in Flink. But we can address this issue from multiple perspectives using the insights provided in this document [1]. [1] https://nightlies.apache.org/flink/flink-docs-release-1.19/

Re: Checkpointing

2024-05-08 Thread Muhammet Orazov via user
Hey Jacob, If you understand how the Kafka offset managed in the checkpoint, then you could map this notion to other Flink sources. I would suggest to read the Data Sources[1] document and FLIP-27[5]. Each source should define a `Split`, then it is `SourceReaderBase`[2] class' responsibility to

Re: Checkpointing and savepoints can never complete after inconsistency

2023-07-10 Thread Alexis Sarda-Espinosa
I found out someone else reported this and found a workaround: https://issues.apache.org/jira/browse/FLINK-32241 Am Mo., 10. Juli 2023 um 16:45 Uhr schrieb Alexis Sarda-Espinosa < sarda.espin...@gmail.com>: > Hi again, > > I have found out that this issue occurred in 3 different clusters, and 2 >

Re: Checkpointing and savepoints can never complete after inconsistency

2023-07-10 Thread Alexis Sarda-Espinosa
Hi again, I have found out that this issue occurred in 3 different clusters, and 2 of them could not recover after restarting pods, it seems state was completely corrupted afterwards and was thus lost. I had never seen this before 1.17.1, so it might be a newly introduced problem. Regards, Alexis

Re: Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-20 Thread David Anderson
One more thing to be aware of: the Presto S3 implementation has issues too. See FLINK-24392 [1]. This means that there's no ideal solution, and in some cases it is preferable to use Hadoop, perhaps in combination with increasing the value of state.storage.fs.memory-threshold [2] in order to decreas

Re: Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-19 Thread Aeden Jameson
Great, that all makes sense to me. Thanks again. On Thu, May 19, 2022 at 11:42 AM David Anderson wrote: > > Sure, happy to try to help. > > What's happening with the hadoop filesystem is that before it writes each key > it checks to see if the "parent directory" exists by checking for a key with

Re: Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-19 Thread David Anderson
Sure, happy to try to help. What's happening with the hadoop filesystem is that before it writes each key it checks to see if the "parent directory" exists by checking for a key with the prefix up to the last "/", and if that key isn't found it then creates empty marker files to cause of that pare

Re: Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-19 Thread Aeden Jameson
Thanks for the response David. That's the conclusion I came to as well. The Hadoop plugin behavior doesn't appear to reflect more recent changes to S3 like strong read-after-write consistency, https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/ . Given the impro

Re: Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-19 Thread David Anderson
Aeden, this is probably happening because you are using the Hadoop implementation of S3. The Hadoop S3 filesystem tries to imitate a filesystem on top of S3. In so doing it makes a lot of HEAD requests. These are expensive, and they violate read-after-create visibility, which is what you seem to b

Re: Checkpointing in StateFun

2022-03-14 Thread Seth Wiesman
where > I can see how Raw state can be used in Flink? > > > Best Regards, > > Christopher Gustafson > -- > *Från:* Seth Wiesman > *Skickat:* den 11 mars 2022 17:57:21 > *Till:* Christopher Gustafson > *Kopia:* user@flink.apache.org >

Re: Checkpointing in StateFun

2022-03-11 Thread Seth Wiesman
I assume you are talking about the checkpointing in the feedback package? StateFun only relies on Flink checkpointing for fault tolerance. All state is stored in standard checkpoint / savepoints and can be used to restore from failure, upgrade a job, rescale, etc. Just like any other snapshot. St

Re: Checkpointing failure, subtasks get stuck

2021-09-02 Thread JING ZHANG
Hi Xiangyu Su, Because of the lack of detailed information, I could only give the troubleshooting ideas. I hope it is helpful to you. 1. find out which checkpoint expire. You could find that in WEB UI [1] or in `jobmanager.log` 2. find out operators which not finished checkpoint yet when the checkp

Re: Checkpointing failure, subtasks get stuck

2021-09-02 Thread Till Rohrmann
Hi Xiangyu, Can you provide us with more information about your job, which state backend you are using and how you've configured the checkpointing? Can you also provide some information about the problematic checkpoints (e.g. alignment time, async/sync duration) that you find on the checkpoint det

Re: Checkpointing Completed and then failed

2021-03-11 Thread Arvid Heise
Hi Abdullah, without specific logs, it's hard to diagnose what went wrong. Could you check in your taskmanager logs if any error occurred and add it? In Flink UI, you can also browse the latest exceptions and look at the checkpoint history. That may give you (and us) additional insights. On Thu,

RE: RE: checkpointing seems to be throttled.

2020-12-24 Thread Colletta, Edward
, Edward Sent: Monday, December 21, 2020 12:32 PM To: Yun Gao ; user@flink.apache.org Subject: RE: RE: checkpointing seems to be throttled. Doh! Yeah, we set the state backend in code and I read the flink-conf.yaml file and use the high-availability storage dir. From: Yun Gao mailto:yungao

RE: RE: checkpointing seems to be throttled.

2020-12-21 Thread Colletta, Edward
Doh! Yeah, we set the state backend in code and I read the flink-conf.yaml file and use the high-availability storage dir. From: Yun Gao Sent: Monday, December 21, 2020 11:28 AM To: Colletta, Edward ; user@flink.apache.org Subject: Re: RE: checkpointing seems to be throttled. This email is

Re: RE: checkpointing seems to be throttled.

2020-12-21 Thread Yun Gao
FS mounted directory. We do monitor backpressure through rest api periodically and we do not see any. From: Yun Gao Sent: Monday, December 21, 2020 10:40 AM To: Colletta, Edward ; user@flink.apache.org Subject: Re: checkpointing seems to be throttled. This email is from an external source -exerci

RE: checkpointing seems to be throttled.

2020-12-21 Thread Colletta, Edward
; user@flink.apache.org Subject: Re: checkpointing seems to be throttled. This email is from an external source - exercise caution regarding links and attachments. Hi Edward, For the second issue, have you also set the statebackend type? I'm asking so because except for the default

Re: checkpointing seems to be throttled.

2020-12-21 Thread Yun Gao
Hi Edward, For the second issue, have you also set the statebackend type? I'm asking so because except for the default heap statebackend, other statebackends should throws exception if the state.checkpoint.dir is not set. Since heap statebackend stores all the snapshots in the JM's memory,

Re: checkpointing opening too many file

2020-05-07 Thread David Anderson
With the FsStateBackend you could also try increasing the value of state.backend.fs.memory-threshold [1]. Only those state chunks that are larger than this value are stored in separate files; smaller chunks go into the checkpoint metadata file. The default is 1KB, increasing this should reduce file

Re: checkpointing opening too many file

2020-05-06 Thread Congxian Qiu
Hi Yes, for your use case, if you do not have large state size, you can try to use FsStateBackend. Best, Congxian ysnakie 于2020年4月27日周一 下午3:42写道: > Hi > If I use FsStateBackend instead of RocksdbFsStateBackend, will the open > files decrease significantly? I dont have large state size. > > tha

Re: checkpointing opening too many file

2020-04-24 Thread Congxian Qiu
Hi If there are indeed so many files need to upload to hdfs, then currently we do not have any solutions to limit the open files, there exist an issue[1] wants to fix this problem, and a pr for it, maybe you can try the attached pr to try it can solve your problem. [1] https://issues.apache.org/ji

Re: Checkpointing is not performing well

2019-09-11 Thread Vijay Bhaskar
having any side effects. Use that value as task manager count and >>>>> then start adding your state backend. First you can try with Rocks DB. >>>>> With >>>>> reduced task manager count you might get good results. >>>>> >>>>> Reg

Re: Checkpointing is not performing well

2019-09-11 Thread Fabian Hueske
ep 8, 2019 at 10:15 AM Rohan Thimmappa < >>>> rohan.thimma...@gmail.com> wrote: >>>> >>>>> Ravi, have you looked at the io operation(iops) rate of the disk? You >>>>> can monitoring the iops performance and tune it accordingly with your wor

Re: Checkpointing is not performing well

2019-09-11 Thread Ravi Bhushan Ratnakar
ppa < >>> rohan.thimma...@gmail.com> wrote: >>> >>>> Ravi, have you looked at the io operation(iops) rate of the disk? You >>>> can monitoring the iops performance and tune it accordingly with your work >>>> load. This helped us in our proje

Re: Checkpointing is not performing well

2019-09-10 Thread Vijay Bhaskar
ch all the parameters. >>> >>> Rohan >>> >>> >>> -- >>> *From:* Ravi Bhushan Ratnakar >>> *Sent:* Saturday, September 7, 2019 5:38 PM >>> *To:* Rafi Aroch >>> *Cc:* user >>> *Subj

Re: Checkpointing is not performing well

2019-09-10 Thread Ravi Bhushan Ratnakar
t; Rohan >> >> >> -- >> *From:* Ravi Bhushan Ratnakar >> *Sent:* Saturday, September 7, 2019 5:38 PM >> *To:* Rafi Aroch >> *Cc:* user >> *Subject:* Re: Checkpointing is not performing well >> >> Hi Rafi, >>

Re: Checkpointing is not performing well

2019-09-10 Thread Vijay Bhaskar
s. > > Rohan > > > -- > *From:* Ravi Bhushan Ratnakar > *Sent:* Saturday, September 7, 2019 5:38 PM > *To:* Rafi Aroch > *Cc:* user > *Subject:* Re: Checkpointing is not performing well > > Hi Rafi, > > Thank you for your quick

Re: Checkpointing is not performing well

2019-09-07 Thread Rohan Thimmappa
: Ravi Bhushan Ratnakar Sent: Saturday, September 7, 2019 5:38 PM To: Rafi Aroch Cc: user Subject: Re: Checkpointing is not performing well Hi Rafi, Thank you for your quick response. I have tested with rocksdb state backend. Rocksdb required significantly more taskmanager to perform as compare to

Re: Checkpointing is not performing well

2019-09-07 Thread Ravi Bhushan Ratnakar
Hi Rafi, Thank you for your quick response. I have tested with rocksdb state backend. Rocksdb required significantly more taskmanager to perform as compare to filesystem state backend. The problem here is that checkpoint process is not fast enough to complete. Our requirement is to do checkout a

Re: Checkpointing is not performing well

2019-09-07 Thread Rafi Aroch
Hi Ravi, Consider moving to RocksDB state backend, where you can enable incremental checkpointing. This will make you checkpoints size stay pretty much constant even when your state becomes larger. https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/state/state_backends.html#the-rocks

Re: Checkpointing & File stream with

2019-06-18 Thread Sung Gon Yi
It works well now with following codes: —— TextInputFormat specFileFormat = new TextInputFormat(new Path(specFile)); specFileFormat.setFilesFilter(FilePathFilter.createDefaultFilter()); DataStream specificationFileStream = env .readFile(specFileFormat, specFile, FileProcessingMode.PROCESS_

Re: Checkpointing & File stream with

2019-06-17 Thread Yun Tang
Hi Sung How about using FileProcessingMode.PROCESS_CONTINUOUSLY [1] as watch type when reading data from HDFS. FileProcessingMode.PROCESS_CONTINUOUSLY would periodically monitor the source while default FileProcessingMode.PROCESS_ONCE would only process once the data and exit. [1] https://ci.

Re: Checkpointing and save pointing

2019-05-10 Thread Fabian Hueske
For checkpoints and savepoints, the JM and all TMs need access to the same storage system. This can be shared NFS that is mounted on each machine. Best, Fabian Am Fr., 10. Mai 2019 um 15:15 Uhr schrieb Boris Lublinsky < boris.lublin...@lightbend.com>: > For now is a regular Link cluster, > But e

Re: Checkpointing and save pointing

2019-05-10 Thread Boris Lublinsky
For now is a regular Link cluster, But even there I want to use both check and save pointing. We do not want to use Hadoop, but rather shared fs - NFS/Gluster. I was trying to see whether volumes need to be mounted only for Job manager or both. HA is the next step. Trying to find the code saving c

Re: Checkpointing and save pointing

2019-05-10 Thread Fabian Hueske
Hi Boris, Is your question is in the context of replacing Zookeeper by a different service for highly-available setups or are you setting up a regular Flink cluster? Best, Fabian Am Mi., 8. Mai 2019 um 06:20 Uhr schrieb Congxian Qiu < qcx978132...@gmail.com>: > Hi, Boris > > TM will also need

Re: Checkpointing and save pointing

2019-05-07 Thread Congxian Qiu
Hi, Boris TM will also need to write to the external volume. Best, Congxian On May 8, 2019, 03:56 +0800, Boris Lublinsky , wrote: > I am planning to use external volume for this. My understanding is that it > needs to be mounted only to the job manager, not the task managers. Is this > correct

Re: checkpointing when yarn session crashed

2019-04-08 Thread Guowei Ma
Could you give more details? Such as which flink version do you use? which Statebackend do you use? Does there has any successful checkpoint? and so on.. I can't reproduce your problem. (I used BucketingSinkTestProgram(enable external checkpoint) + Flink 1.7.2 and default StateBackend ) Best, Gu

Re: Checkpointing to gcs taking too long

2018-11-29 Thread Chesnay Schepler
Please provide the full Exception stack trace and the configuration of your job (parallelism, number of stateful operators). Have you tried using the gcs-connector in isolation? This may not be an issue with Flink. On 28.11.2018 07:01, prakhar_mathur wrote: I am trying to run flink on kubernet

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Paul Lam
Hi Niels, The link was broken, it should be https://issues.apache.org/jira/browse/FLINK-2491 . A similar question was asked a few days ago. Best, Paul Lam > 在 2018年10月17日,19:56,Niels van Kaam 写道: > > Hi All, > > Thanks for the responses,

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Niels van Kaam
Hi All, Thanks for the responses, the finished source explains my issue then. I can work around the problem by letting my sources negotiate a "final" checkpoint via zookeeper. @Paul, I think your answer was meant for the earlier question asked by Joshua? Cheers, Niels On Wed, Oct 17, 2018 at 11

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Joshua Fan
Hi Niels, Probably not, an operator begins to do checkpoint until it gets all the barriers from all the upstream sources, if one source can not send a barrier, the downstream operator can not do checkpoint, FYI. Yours sincerely Joshua On Wed, Oct 17, 2018 at 4:58 PM Niels van Kaam wrote: > Hi

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Fabian Hueske
Hi Niels, Checkpoints can only complete if all sources are running. That's because the checkpoint mechanism relies on injecting checkpoint barriers into the stream at the sources. Best, Fabian Am Mi., 17. Okt. 2018 um 11:11 Uhr schrieb Paul Lam : > Hi Niels, > > Please see https://issues.apache

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Paul Lam
Hi Niels, Please see https://issues.apache.org/jira/browse/FLINK-249 . Best, Paul Lam > 在 2018年10月17日,16:58,Niels van Kaam 写道: > > Hi All, > > I am debugging an issue where the periodic checkpointing has halted. I > noticed that one of the so

Re: Checkpointing not working

2018-09-20 Thread Stefan Richter
Hi, in the absence of any logs, my guess would be that your checkpoints are just not able to complete within 10 seconds, the state might be to large or the network and fs to slow. Are you using full or incremental checkpoints? For your relative small interval, I suggest that you try using incre

Re: Checkpointing not working

2018-09-20 Thread vino yang
Hi Yubraj, Can you set your log print level to DEBUG and share it with us or share a screenshot of your Flink web UI checkpoint information? Thanks, vino. Jörn Franke 于2018年9月19日周三 下午2:37写道: > What do the logfiles say? > > How does the source code looks like? > > Is it really needed to do chec

Re: ***UNCHECKED*** Re: Checkpointing not working

2018-09-19 Thread Vijay Bhaskar
Can you please check the following document and verify whether you have enough network bandwidth to support 30 seconds check point interval worth of the streaming data? https://data-artisans.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines Regards Bhaskar On Wed, Sep 19, 2018 at

***UNCHECKED*** Re: Checkpointing not working

2018-09-18 Thread yuvraj singh
log :: Checkpoint 58 of job 0efaa0e6db5c38bec81dfefb159402c0 expired before completing. I have a use case where i need to do the checkpointing frequently . i am using Kafka to read stream and making a window of 1 hour , which is having 50gb data always and it can be more than that . i have seen

Re: Checkpointing not working

2018-09-18 Thread Jörn Franke
What do the logfiles say? How does the source code looks like? Is it really needed to do checkpointing every 30 seconds? > On 19. Sep 2018, at 08:25, yuvraj singh <19yuvrajsing...@gmail.com> wrote: > > Hi , > > I am doing checkpointing using s3 and rocksdb , > i am doing checkpointing per 30

Re: Checkpointing not happening in Standalone HA mode

2018-07-27 Thread Vinay Patil
Hi Vino, Yes, Job runs successfully, however, no checkpoints are successful. I will update the source Regards, Vinay Patil On Fri, Jul 27, 2018 at 2:00 PM vino yang wrote: > Hi Vinay, > > Oh! You use a collection source? That's the problem. Please use a general > source like Kafka or others.

Re: Checkpointing not happening in Standalone HA mode

2018-07-27 Thread vino yang
Hi Vinay, Oh! You use a collection source? That's the problem. Please use a general source like Kafka or others. Maybe your checkpoint has not be triggered, your job has stopped. Thanks, vino. 2018-07-27 16:07 GMT+08:00 Vinay Patil : > Hi Vino, > > Yes I am enabling checkpoint in the code as f

Re: Checkpointing not happening in Standalone HA mode

2018-07-27 Thread Vinay Patil
Hi Vino, Yes I am enabling checkpoint in the code as follows : StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment(",,getJobConfiguration(),jarPath"); env.enableCheckpointing(1000); env.setSateBackend(new FsStateBackend("file:///")); env.getCheckpointConfig().s

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread vino yang
Hi Vinay: Did you call specific config API refer to this documentation[1]; Can you share your job program and JM Log? Or the JM log contains the log message like this pattern "Triggering checkpoint {} @ {} for job {}."? [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Vinay Patil
Hi Chesnay, No error in the logs. That is why I am not able to understand why checkpoints are getting triggered. Regards, Vinay Patil On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler wrote: > Please check the job- and taskmanager logs for anything suspicious. > > On 25.07.2018 12:33, Vinay Pa

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Chesnay Schepler
Can you provide us with the job code? I assume that checkpointing runs properly if you submit the same job to a normal cluster? On 25.07.2018 13:15, Vinay Patil wrote: No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered. Regards, Vinay Pat

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Vinay Patil
No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered. Regards, Vinay Patil On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil wrote: > Hi Chesnay, > > No error in the logs. That is why I am not able to understand why > checkpoints are getting trigger

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Chesnay Schepler
Please check the job- and taskmanager logs for anything suspicious. On 25.07.2018 12:33, Vinay Patil wrote: Hi, I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am

Re: Checkpointing in Flink 1.5.0

2018-07-11 Thread Data Engineer
checkpointing directory using glusterfs >> volume mount (thus file access protocol file:///) was working fine till >> 1.4.2 for us. So we like to understand where the breakage happened in >> 1.5.0. >> >> Can you please mention me the relevant source code files related to

Re: Checkpointing in Flink 1.5.0

2018-07-10 Thread Sampath Bhat
elated to > rocksdb “custom file path” parsing logic? We would be interested to > investigate this. > > > > I also observed below in the log – > > > > Config uses deprecated configuration key > 'state.backend.rocksdb.checkpointdir' instead of proper key > '

Re: Checkpointing in Flink 1.5.0

2018-07-04 Thread Chesnay Schepler
s deprecated configuration key 'state.backend.rocksdb.checkpointdir' instead of proper key 'state.backend.rocksdb.localdir' Regards, Shaswata *From:*Chesnay Schepler [mailto:ches...@apache.org] *Sent:* Tuesday, July 03, 2018 5:52 PM *To:* Data Engineer *Cc:* user@flink.apache

Re: Checkpointing in Flink 1.5.0

2018-07-04 Thread Chesnay Schepler
caldir' Regards, Shaswata *From:*Chesnay Schepler [mailto:ches...@apache.org] *Sent:* Tuesday, July 03, 2018 5:52 PM *To:* Data Engineer *Cc:* user@flink.apache.org *Subject:* Re: Checkpointing in Flink 1.5.0 The code appears to be working fine. This may happen because you're using a G

RE: Checkpointing in Flink 1.5.0

2018-07-03 Thread Jash, Shaswata (Nokia - IN/Bangalore)
swata From: Chesnay Schepler [mailto:ches...@apache.org] Sent: Tuesday, July 03, 2018 5:52 PM To: Data Engineer Cc: user@flink.apache.org Subject: Re: Checkpointing in Flink 1.5.0 The code appears to be working fine. This may happen because you're using a GlusterFS volume. The RocksDBSta

Re: Checkpointing in Flink 1.5.0

2018-07-03 Thread Chesnay Schepler
The code appears to be working fine. This may happen because you're using a GlusterFS volume. The RocksDBStateBackend uses java Files internally (NOT nio Paths), which AFAIK only work properly against the plain local file-system. The GlusterFS nio FIleSystem implementation also explicitly does

Re: Checkpointing in Flink 1.5.0

2018-07-03 Thread Chesnay Schepler
Thanks. Looks like RocksDBStateBackend.setDbStoragePaths has some custom file path parsing logic, will probe it a bit to see what the issue is. On 03.07.2018 13:45, Data Engineer wrote: 2018-07-03 11:30:35,703 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -

Re: Checkpointing in Flink 1.5.0

2018-07-03 Thread Data Engineer
2018-07-03 11:30:35,703 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2018-07-03 11:30:35,705 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionCluste

Re: Checkpointing in Flink 1.5.0

2018-07-03 Thread Chesnay Schepler
Doesn't sound like intended behavior, can you give us the stacktrace? On 03.07.2018 13:17, Data Engineer wrote: The Flink documentation says that we need to specify the filesystem type (file://, hdfs://) when configuring the rocksdb backend dir. https://ci.apache.org/projects/flink/flink-docs-

Re: Checkpointing on cluster shutdown

2018-06-05 Thread Chesnay Schepler
If a TM goes down any data generated after the last successful checkpoint cannot be guaranteed to be consistent across the cluster. Hence, this data is discarded and we go back to the last known consistent state, the last checkpoint that was successfully created. On 05.06.2018 13:06, Garvit Sha

Re: Checkpointing on cluster shutdown

2018-06-05 Thread Garvit Sharma
But job should be terminated gracefully. Why is this behavior not there? On Tue, Jun 5, 2018 at 4:19 PM, Chesnay Schepler wrote: > No checkpoint will be triggered when the cluster is shutdown. For this > case you will have to manually trigger a savepoint. > > If a TM goes down it does not create

Re: Checkpointing on cluster shutdown

2018-06-05 Thread Chesnay Schepler
No checkpoint will be triggered when the cluster is shutdown. For this case you will have to manually trigger a savepoint. If a TM goes down it does not create a checkpoint. IN these cases the job will be restarted from the last successful checkpoint. On 05.06.2018 12:01, Data Engineer wrote:

Re: Checkpointing when reading from files?

2018-06-05 Thread Fabian Hueske
Hi, The continuous file source is split into two components. 1) A split generator that monitors a directory and generates splits when a new file is observed, and 2) reading tasks that receive splits and read the referenced files. I think this is the code that generates input splits which are dist

Re: Checkpointing when reading from files?

2018-05-27 Thread Padarn Wilson
I'm a bit confused about this too actually. I think the above would work as a solution if you want to continuously monitor a directory, but for a "PROCESS_ONCE" readFile source I don't think you will get a checkpoint emitted indicating the end of the stream. My understanding of this is that there

Re: Checkpointing when reading from files?

2018-05-21 Thread Amit Jain
Hi Alex, StreamingExecutionEnvironment#readFile is a helper function to create file reader data streaming source. It uses ContinuousFileReaderOperator and ContinuousFileMonitoringFunction internally. As both file reader operator and monitoring function uses checkpointing so is readFile [1], you c

Re: Checkpointing barriers

2018-04-24 Thread Fabian Hueske
Hi Alex, That's correct. The n refers to the n-th checkpoint. The checkpoint ID is important, because operators need to align the barriers to ensure that they consumed all inputs up to the point, where the barriers were injected into the stream. Each operator checkpoints its own state. For sources

Re: Checkpointing barriers

2018-04-24 Thread Alexander Smirnov
ok, I got it. Barrier-n is an indicator or n-th checkpoint. My first impression was that barriers are carrying offset information, but it was wrong. Thanks for unblocking ;-) Alex

Re: Checkpointing barriers

2018-04-23 Thread Ted Yu
barrier n appearing in all the streams serves as synchronization point. As explained in the subsequent paragraph: bq. Otherwise, it would mix records that belong to snapshot *n*and with records that belong to snapshot *n+1*. Cheers On Mon, Apr 23, 2018 at 7:21 AM, Alexander Smirnov < alexander.

Re: Checkpointing with RocksDB as statebackend

2017-06-30 Thread Aljoscha Krettek
et this working. >>> >>> @Stefan or @Stephan : can you please help in resolving this issue >>> >>> Regards, >>> Vinay Patil >>> >>> On Thu, Jun 29, 2017 at 6:01 PM, gerryzhou [via Apache Flink User Mailing >>> List archive.]

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread Vinay Patil
t; >>> @Stefan or @Stephan : can you please help in resolving this issue >>> >>> Regards, >>> Vinay Patil >>> >>> On Thu, Jun 29, 2017 at 6:01 PM, gerryzhou [via Apache Flink User >>> Mailing List archive.] wrote: >>> >>

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread Vinay Patil
a similar problem in flink 1.3.0 with rocksdb. I wonder >>> how to use FRocksDB as you mentioned above. Thanks. >>> >>> -- >>> If you reply to this email, your message will be added to the discussion >>> below: >>> http

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread Aljoscha Krettek
ailing >> List archive.] > <mailto:ml+s2336050n1406...@n4.nabble.com>> wrote: >> Hi, Vinay, >> I observed a similar problem in flink 1.3.0 with rocksdb. I wonder how >> to use FRocksDB as you mentioned above. Thanks. >> >> If you reply to th

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread Vinay Patil
ioned above. Thanks. >> >> -- >> If you reply to this email, your message will be added to the discussion >> below: >> http://apache-flink-user-mailing-list-archive.2336050.n4. >> nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread Aljoscha Krettek
sdb. I wonder how > to use FRocksDB as you mentioned above. Thanks. > > If you reply to this email, your message will be added to the discussion > below: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p14063.

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread Aljoscha Krettek
d above. Thanks. > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p14063.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com.

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread Vinay Patil
e. Thanks. > > -- > If you reply to this email, your message will be added to the discussion > below: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re- > Checkpointing-with-RocksDB-as-statebackend-tp11752p14063.html > To start

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread gerryzhou
Hi, Vinay, I observed a similar problem in flink 1.3.0 with rocksdb. I wonder how to use FRocksDB as you mentioned above. Thanks. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend

Re: Checkpointing with RocksDB as statebackend

2017-06-29 Thread Vinay Patil
disk. >>> > >>> > I have attached the snapshot for reference. >>> > >>> > Also the data processed till now is only 17GB and above 120GB memory is >>> > getting used. >>> > >>> > Is there any change wrt RocksDB configurations >>> > >>> > <http://apache-flink-user-mailing-list-archive.2336050.n4.na >>> bble.com/file/n14013/TM_Memory_Usage.png> >>> > >>> > Regards, >>> > Vinay Patil >>> > >>> > >>> > >>> > -- >>> > View this message in context: http://apache-flink-user-maili >>> ng-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with- >>> RocksDB-as-statebackend-tp11752p14013.html >>> > Sent from the Apache Flink User Mailing List archive. mailing list >>> archive at Nabble.com. >>> >>> >> >

Re: Checkpointing with RocksDB as statebackend

2017-06-28 Thread SHI Xiaogang
> Also the data processed till now is only 17GB and above 120GB memory is >> > getting used. >> > >> > Is there any change wrt RocksDB configurations >> > >> > <http://apache-flink-user-mailing-list-archive.2336050.n4.na >> bble.com/file/n14013/TM_Mem

Re: Checkpointing with RocksDB as statebackend

2017-06-28 Thread Vinay Patil
4013/TM_Memory_Usage.png> > > > > Regards, > > Vinay Patil > > > > > > > > -- > > View this message in context: http://apache-flink-user-maili > ng-list-archive.2336050.n4.nabble.com/Re-Checkpointing- > with-RocksDB-as-statebackend-tp11752p14013.html > > Sent from the Apache Flink User Mailing List archive. mailing list > archive at Nabble.com. > >

Re: Checkpointing with RocksDB as statebackend

2017-06-28 Thread Aljoscha Krettek
3/TM_Memory_Usage.png> > > > Regards, > Vinay Patil > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p14013.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com.

Re: Checkpointing with RocksDB as statebackend

2017-06-27 Thread vinay patil
RocksDB configurations <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n14013/TM_Memory_Usage.png> Regards, Vinay Patil -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-stateb

Re: Checkpointing with RocksDB as statebackend

2017-06-26 Thread Vinay Patil
;> >>>>>>>>>>> Best, >>>>>>>>>>> Stefan >>>>>>>>>>> >>>>>>>>>>> Am 14.03.2017 um 15:31 schrieb Vishnu Viswanath <[hidden email

Re: Checkpointing SIGSEGV

2017-05-29 Thread Stefan Richter
FYI, I created this JIRA https://issues.apache.org/jira/browse/FLINK-6761 to track the problem of large merging state per key. I might also bring this the the RocksDB issue tracker and then figure out how to solve this. > Am 27.05.2017 um 20:28

Re: Checkpointing SIGSEGV

2017-05-27 Thread Stefan Richter
Hi, this is a known and currently „accepted“ problem in Flink which can only happen when a task manager is already going down, e.g. on cancelation. It happens when the RocksDB object was already disposed (as part of the shutdown procedure) but there is still a pending timer firing, and in the p

Re: Checkpointing SIGSEGV

2017-05-26 Thread Stefan Richter
Flink’s version is hosted here: https://github.com/dataArtisans/frocksdb > Am 26.05.2017 um 19:59 schrieb Jason Brelloch : > > Thanks for looking into this Stefan. We are moving forward with a different > strategy for now. If I want to take a look at

Re: Checkpointing SIGSEGV

2017-05-26 Thread Jason Brelloch
Thanks for looking into this Stefan. We are moving forward with a different strategy for now. If I want to take a look at this, where do I go to get the Flink version of RocksDB? On Fri, May 26, 2017 at 1:06 PM, Stefan Richter wrote: > I forgot to mention that you need to run this with Flink’s

Re: Checkpointing SIGSEGV

2017-05-26 Thread Stefan Richter
I forgot to mention that you need to run this with Flink’s version of RocksDB, as the stock version is already unable to perform the inserts because their implementation of merge operator has a performance problem. Furthermore, I think a higher multiplicator than *2 is required on num (and/or a

Re: Checkpointing SIGSEGV

2017-05-26 Thread Stefan Richter
I played a bit around with your info and this looks now like a general problem in RocksDB to me. Or more specifically, between RocksDB and the JNI bridge. I could reproduce the issue with the following simple test code: File rocksDir = new File("/tmp/rocks"); final Options options = new Options(

Re: Checkpointing SIGSEGV

2017-05-26 Thread Jason Brelloch
~2 GB was the total state in the backend. The total number of keys in the test is 10 with an approximately even distribution of state across keys, and parallelism of 1 so all keys are on the same taskmanager. We are using ListState and the number of elements per list would be about 50. On Fr

Re: Checkpointing SIGSEGV

2017-05-26 Thread Stefan Richter
Hi, what means „our state“ in this context? The total state in the backend or the state under one key? If you use, e.g. list state, I could see that the state for one key can grow above 2GB, but once we retrieve the state back from RocksDB as Java arrays (in your stacktrace, when making a check

Re: Checkpointing SIGSEGV

2017-05-26 Thread Robert Metzger
Hi Jason, This error is unexpected. I don't think its caused by insufficient memory. I'm including Stefan into the conversation, he's the RocksDB expert :) On Thu, May 25, 2017 at 4:15 PM, Jason Brelloch wrote: > Hey guys, > > We are running into a JVM crash on checkpointing when our rocksDB st

Re: Checkpointing with RocksDB as statebackend

2017-03-28 Thread Stefan Richter
t would cause a >> FileNotFound exception which would fail the checkpoint. >> >> >> >> Stephan, >> >> >> >> Currently my aws fork contains some very specific assumptions about the >> pipeline that will in general only hold for my pipeline. Th

  1   2   >