Hi all,
I am wondering if there is a way to make a flink job fail (not cancel it)
when one or several checkpoints have failed due to being expired (taking
longer than the timeout) ?
I am using Flink 1.9.2 and have set `
*setTolerableCheckpointFailureNumber(1)*` which doesn't do the trick.
Looking
b to fail when checkpoint
> expired?
>
> Best,
> Congxian
>
>
> Timo Walther 于2020年4月2日周四 下午11:23写道:
>
>> Hi Robin,
>>
>> this is a very good observation and maybe even unintended behavior.
>> Maybe Arvid in CC is more familiar with the checkpointing?
>>
affect step 3;
>
> [1] https://issues.apache.org/jira/browse/FLINK-17043
> Best,
> Congxian
>
>
> Robin Cassan 于2020年4月3日周五 下午8:35写道:
>
>> Hi Congxian,
>>
>> Thanks for confirming! The reason I want this behavior is because we are
>> currently investiga
Hi all,
We are currently experiencing long checkpointing times on S3 and are
wondering how abnormal it is compared to other loads and setups. Could some
of you share a few stats in your running architecture so we can compare?
Here are our stats:
*Architecture*: 28 TM on Kubernetes, 4 slots per T
Hi all!
I cannot seem to find any setting to limit the number of records appended
in a RocksDBListState that is used when we use SessionWindows with a
ProcessFunction.
It seems that, for each incoming element, the new element will be appended
to the value with the RocksDB `merge` operator, without
040/java/src/main/java/org/rocksdb/RocksDB.java#L1382
>
> Best
> Yun Tang
> --
> *From:* Robin Cassan
> *Sent:* Friday, May 15, 2020 0:37
> *To:* user
> *Subject:* Protection against huge values in RocksDB List State
>
> Hi all!
&
u do not have
>> many key-by keys, operator state is a good choice as that is on-heap and
>> lightweight.
>>
>> Best
>> Yun Tang
>> --
>> *From:* Robin Cassan
>> *Sent:* Friday, May 15, 2020 20:59
>> *To:* Yun Tang
>
could pre-partition our
data in Kafka so that Flink can avoid shuffling data before creating
the Session Windows?
Cheers,
Robin
--
<https://www.contentsquare.com/>
Robin CASSAN
Data Engineer
+33 6 50 77 88 36
5 boulevard de la Madeleine - 75001 Paris
<https://data.sigi
am/experimental.html
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.html
>> Best,
>> Congxian
>>
>>
>> Robin Cassan 于2019年11月30日周六 上午12:17写道:
>>
>>> Hi all!
&g
Hi all!
We've happily been running a Flink job in production for a year now, with
the RocksDB state backend and incremental retained checkpointing on S3. We
often release new versions of our jobs, which means we cancel the running
one and submit another while restoring the previous jobId's last re
still in use.
>
> I know this is only a partial answer to your question. I'll try to find
> out more details and extend my answer later.
>
>
> On Thu, Jul 29, 2021 at 2:31 PM Robin Cassan <
> robin.cas...@contentsquare.com> wrote:
>
>> Hi all!
>>
>
ould
> let new job get decoupled with older checkpoints. Do you think that could
> resolve your case?
>
> Best
> Yun Tang
> --
> *From:* Robin Cassan
> *Sent:* Wednesday, September 1, 2021 17:38
> *To:* Robert Metzger
> *Cc:* user
> *Sub
Hey all!
I have a conceptual question on the DataStream API: when using an in-memory
state backend (like the HashMapStateBackend), how can you ensure that the
hashmap won't grow uncontrollably until OutOfMemory happens?
In my case, I would be consuming from a Kafka topic, into a SessionWindow.
The
tes on objects on the Java heap; however,
> state size is limited by available memory within the cluster. "
>
> if the size of your window state is really huge, you should choose other
> state backend.
> Hopes my reply would help to you.
>
> Best,
> Yuan
>
>
> &g
Hi all! We are running a flink cluster on kubernetes and deploying a single
job on it through "flink run ". Whenever we want to modify the jar, we
cancel the job and run the "flink run" command again, with the new jar, and
the retained checkpoint URL from the first run.
This works well, but this ad
oordinator, failover handling, and
> operator coordinator.
>
> But, IMO, Job RollingUpgrade is a worthwhile thing to do.
>
> Best,
> Weihua
>
>
> On Wed, Jun 29, 2022 at 3:52 PM Robin Cassan <
> robin.cas...@contentsquare.com> wrote:
>
>> Hi all! We are r
Hello all!
We have a need where, for specific recovery cases, we would need to
manually reset our Flink kafka consumer offset to a given date but have the
Flink job restore its state. As I understand, restoring from a checkpoint
necessarily sets the group's offset to the one that was in the checkp
Thanks a lot Alexander and Tzu-Li for your answers, this helps a lot!!
Cheers,
Robin
Le ven. 8 juil. 2022 à 17:40, Tzu-Li (Gordon) Tai a
écrit :
> Hi Robin,
>
> Apart from what Alexander suggested, I think you could also try the
> following first:
> Let the job use a "new" Kafka source, which y
Hi all!
It seems Akka have announced a licensing change
https://www.lightbend.com/blog/why-we-are-changing-the-license-for-akka
If I understand correctly, this could end-up increasing cost a lot for
companies using Flink in production. Do you know if the Flink developers
have any initial reaction a
Thanks a lot for your answers, this is reassuring!
Cheers
Le mer. 7 sept. 2022 à 13:12, Chesnay Schepler a
écrit :
> Just to squash concerns, we will make sure this license change will not
> affect Flink users in any way.
>
> On 07/09/2022 11:14, Robin Cassan via user wrote:
> &
Hello all, hope you're well :)
We are attempting to build a Flink job with minimal and stable latency (as
much as possible) that consumes data from Kafka. Currently our main
limitation happens when our job checkpoints the RocksDB state: backpressure
is applied on the stream, causing latency. I am w
ps.
>
> In long term, I think we probably need to separate the compaction process
> from the internal db and control/schedule the compaction process ourselves
> (compaction takes a good amount of CPU and reduces TPS).
>
> Best.
> Yuan
>
>
>
> On Thu, Oct 13, 2022 at 11
Hello all!
We are trying to bring our flink job closer to real-time processing and
currently our main issue is latency that happens during checkpoints. Our
job uses RocksDB with periodic checkpoints, which are a few hundred GBs
every 15 minutes. We are trying to reduce the checkpointing duration b
ttps://issues.apache.org/jira/browse/FLINK-28699
> [2]
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/fault-tolerance/checkpointing/#state-backend-incremental
> [3]
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/config
elp to make incremental checkpoint size small and
>>> stable which could make the CPU more stable.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-28699
>>> [2]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/fault
Hello all!
I am using the flink kubernetes operator and I would like to set the value
for `taskmanager.memory.process.size`. I set the desired value in the
flinkdeployment resource specs (here, I want 55gb), however it looks like
the value that is effectively passed to the taskmanager is the same
t;
> So basically the spec is a config shorthand, there is no reason to
> override it as you won't get a different behaviour at the end of the day.
>
> Gyula
>
> On Wed, Jun 14, 2023 at 11:55 AM Robin Cassan via user <
> user@flink.apache.org> wrote:
>
>> He
d fraction.
>
> Gyula
>
> On Wed, Jun 14, 2023 at 2:46 PM Robin Cassan <
> robin.cas...@contentsquare.com> wrote:
>
>> Thanks Gyula for your answer! I'm wondering about your claim:
>> > In Flink kubernetes the process is the pod so pod memory is always
28 matches
Mail list logo