RE: Flink restarts on Checkpoint failure

2021-09-01 Thread Schwalbe Matthias
look into when I run into similar situations Feel free to get back to the mailing list for further clarifications … Thias From: Caizhi Weng Sent: Donnerstag, 2. September 2021 04:24 To: Daniel Vol Cc: user Subject: Re: Flink restarts on Checkpoint failure Hi! There are a ton of possible

Re: Flink restarts on Checkpoint failure

2021-09-01 Thread Caizhi Weng
Hi! There are a ton of possible reasons for a checkpoint failure. The most possible reasons might be * The JVM is busy with garbage collecting when performing the checkpoints. This can be checked by looking into the GC logs of a task manager. * The state suddenly becomes quite large due to some

Flink restarts on Checkpoint failure

2021-09-01 Thread Daniel Vol
Hello, I see the following error in my jobmanager log (Flink on EMR): Checking cluster logs I see : 2021-08-21 17:17:30,489 [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 (type=CHECKPOINT) @ 1629566250303 for job c513e9ebbea4ab72d80b133

Re: Watermark UI after checkpoint failure

2021-07-20 Thread Dan Hill
It's after a checkpoint failure. I don't know if that includes a restore from a checkpoint. I'll take some screenshots when the jobs hit the failure again. All of my currently running jobs are healthy right now and haven't hit a checkpoint failure. On Sun, Jul 18, 20

Re: Watermark UI after checkpoint failure

2021-07-18 Thread Dawid Wysakowicz
ink job hits a checkpoint failure (e.g. timeout) and > then has successful checkpoints, the flink job appears to be in a bad > state.  E.g. some of the operators that previously had a watermark > start showing "no watermark".  The jobs proceed very slowly. > > Is there docum

Re: Watermark UI after checkpoint failure

2021-07-18 Thread Caizhi Weng
Hi! This does not sound like an expected behavior. Could you share your code / SQL and flink configuration so that others can help diagnose the issue? Dan Hill 于2021年7月19日周一 下午1:41写道: > After my dev flink job hits a checkpoint failure (e.g. timeout) and then > has successful checkpoint

Watermark UI after checkpoint failure

2021-07-18 Thread Dan Hill
After my dev flink job hits a checkpoint failure (e.g. timeout) and then has successful checkpoints, the flink job appears to be in a bad state. E.g. some of the operators that previously had a watermark start showing "no watermark". The jobs proceed very slowly. Is there documentatio

Re: Iterate Operator Checkpoint Failure

2021-04-16 Thread Lu Niu
Hi, Fabian Thanks for replying. I created this ticket. It contains how to reproduce it using code in flink-example package: https://issues.apache.org/jira/browse/FLINK-22326 Best Lu On Fri, Apr 16, 2021 at 1:25 AM Fabian Paul wrote: > Hi Lu, > > Can you provide some more detailed logs of what

Re: Iterate Operator Checkpoint Failure

2021-04-16 Thread Fabian Paul
Hi Lu, Can you provide some more detailed logs of what happened during the checkpointing phase? If it is possible please enable debug logs enabled. It would be also great know whether you have implemented your own Iterator Operator or what kind of Flink program you are trying to execute. Best,

Iterate Operator Checkpoint Failure

2021-04-15 Thread Lu Niu
Hi, Flink Users When we migrate from flink 1.9.1 to flink 1.11, we notice job will always fail on checkpoint if job uses Iterator Operator, no matter we use unaligned checkpoint or not. Those jobs don't have checkpoint issues in 1.9. Is this a known issue? Thank you! Best Lu

Re: Frequently checkpoint failure, could make the flink sql state not clear?

2020-01-16 Thread Congxian Qiu
state, if Flink task always checkpoint failure, are the key state cleared > by timer? > Thanks to your replay. >

Frequently checkpoint failure, could make the flink sql state not clear?

2020-01-16 Thread LakeShen
Hi community, now I am using Flink sql , and I set the retention time, As I all know is that Flink will set the timer for per key to clear their state, if Flink task always checkpoint failure, are the key state cleared by timer? Thanks to your replay.

Re: checkpoint failure suddenly even state size less than 1 mb

2019-09-06 Thread Sushant Sawant
> *Sent:* Tuesday, August 27, 2019 15:01 > *To:* user > *Subject:* Re: checkpoint failure suddenly even state size less than 1 mb > > Hi team, > Anyone for help/suggestion, now we have stopped all input in kafka, there > is no processing, no sink but checkpointing is f

Re: checkpoint failure suddenly even state size is into 10 mb around

2019-09-02 Thread Biao Liu
Kafka source shows high back pressure. > 2. Sudden checkpoint failure for entire day until restart. > > My job does following thing, > a. Read from Kafka > b. Asyncio to external system > c. Dumping in Cassandra, Elasticsearch > > Checkpointing is using file system. > This

Re: checkpoint failure in forever loop suddenly even state size less than 1 mb

2019-09-02 Thread Fabian Hueske
09 pengcheng...@bonc.com.cn, < >> pengcheng...@bonc.com.cn> wrote: >> >>> Hi,What's your checkpoint config? >>> >>> -- >>> pengcheng...@bonc.com.cn >>> >>> >>> *From:* Sushant Sawant &

Re: checkpoint failure suddenly even state size less than 1 mb

2019-08-30 Thread Yun Tang
t: Re: checkpoint failure suddenly even state size less than 1 mb Hi team, Anyone for help/suggestion, now we have stopped all input in kafka, there is no processing, no sink but checkpointing is failing. Is it like once checkpoint fails it keeps failing forever until job restart. Help appreciated. T

Re: checkpoint failure suddenly even state size less than 1 mb

2019-08-27 Thread Sushant Sawant
p.m., "Sushant Sawant" wrote: Hi all, m facing two issues which I believe are co-related though. 1. Kafka source shows high back pressure. 2. Sudden checkpoint failure for entire day until restart. My job does following thing, a. Read from Kafka b. Asyncio to external system c. Dumpin

checkpoint failure suddenly even state size is into 10 mb around

2019-08-23 Thread Sushant Sawant
Hi all, m facing two issues which I believe are co-related though. 1. Kafka source shows high back pressure. 2. Sudden checkpoint failure for entire day until restart. My job does following thing, a. Read from Kafka b. Asyncio to external system c. Dumping in Cassandra, Elasticsearch

Re: Checkpoint failure

2019-07-14 Thread Biao Liu
the performance of state backend, etc. Navneeth Krishnan 于2019年7月14日周日 上午5:01写道: > Hi All, > > Any pointers on the below checkpoint failure scenario. Appreciate all the > help. Thanks > > Thanks > > On Sun, Jul 7, 2019 at 9:23 PM Navneeth Krishnan > wrote: > &g

Re: Checkpoint failure

2019-07-13 Thread Navneeth Krishnan
Hi All, Any pointers on the below checkpoint failure scenario. Appreciate all the help. Thanks Thanks On Sun, Jul 7, 2019 at 9:23 PM Navneeth Krishnan wrote: > Hi All, > > Occasionally I run into failed checkpoints error where 2 or 3 consecutive > checkpoints fails after running

Checkpoint failure

2019-07-07 Thread Navneeth Krishnan
Hi All, Occasionally I run into failed checkpoints error where 2 or 3 consecutive checkpoints fails after running for a minute and then it recovers. This is causing delay in processing the incoming data since there is huge amount of data buffered during the failed checkpoints. I don't see any erro

Re: Link checkpoint failure issue

2018-06-05 Thread Chesnay Schepler
Can you provide us with the TaskManager logs? On 05.06.2018 12:30, James (Jian Wu) [FDS Data Platform] wrote: Hi: I am using Flink streaming continuous query. Scenario: Kafka-connector to consume a topic, and streaming incremental calculate 24 hours window data. And use processingTime a

Link checkpoint failure issue

2018-06-05 Thread James (Jian Wu) [FDS Data Platform]
Hi: I am using Flink streaming continuous query. Scenario: Kafka-connector to consume a topic, and streaming incremental calculate 24 hours window data. And use processingTime as TimeCharacteristic. I am using RocksDB as StateBackend, file system is HDFS, and checkpoint interval is 5 minu