ith relatively small states.
>>> RockDb commenting out the remove() is a bit surprisingly high but since
>>> it's causing correctness issues under some circumstances I would abandon
>>> that.
>>>
>>> > I probably need more time to ensure we definite
gt; > I probably need more time to ensure we definitely see all the keys, but
> that looks very good.
>
> Yeah, correctness is key here so waiting on your results.
>
> BR,
> G
>
>
> On Wed, Feb 12, 2025 at 11:56 AM Jean-Marc Paulin wrote:
>
>> Hi Gabor,
>
:
> Yeah, I think this is more like the 1.x and 2.x incompatibility.
> I've just opened the PR agains 1.20 which you can cherry-pick here [1].
>
> [1] https://github.com/apache/flink/pull/26145
>
> BR,
> G
>
>
> On Tue, Feb 11, 2025 at 4:19 PM Jean-Marc Paulin wrote
:
> Yeah, I think this is more like the 1.x and 2.x incompatibility.
> I've just opened the PR agains 1.20 which you can cherry-pick here [1].
>
> [1] https://github.com/apache/flink/pull/26145
>
> BR,
> G
>
>
> On Tue, Feb 11, 2025 at 4:19 PM Jean-Marc Paulin wr
xecution time: 25M27.126737S
>>> - New execution time: 1M19.602042S
>>> In short: ~95% performance gain.
>>>
>>> G
>>>
>>>
>>> On Thu, Feb 6, 2025 at 9:06 AM Gabor Somogyi
>>> wrote:
>>>
>>>> In short, when yo
:06 AM Gabor Somogyi
>> wrote:
>>
>>> In short, when you don't care about
>>> multiple KeyedStateReaderFunction.readKey calls then you're on the safe
>>> side.
>>>
>>> G
>>>
>>> On Wed, Feb 5, 2025 at 6:27 P
s in this area. It's still a question whether the
>>> tested patch is the right approach
>>> but at least we've touched the root cause.
>>>
>>> The next step on my side is to have a deep dive and understand all the
>>> aspects why remove is t
t;> 2025-02-04 13:39:14,704 INFO o.a.f.e.s.r.FlinkTestStateReaderJob
>> [] - Execution time: PT6.930659S
>>
>> Don't need to mention that the bigger is the processor API.
>>
>> G
>>
>>
>> On Tue, Feb 4, 2025 at 1:40 PM Jean-Marc
on a custom written operator, which opens the state
> info for you.
>
> The only drawback is that you must know the keyBy range... this can be
> problematic but if you can do it it's a win :)
>
> G
>
>
> On Tue, Feb 4, 2025 at 12:16 PM Jean-Marc Paulin wrote:
>
>
h
> and intended to eliminate the slowness. Since we don't know what exactly
> causes the slowness the new Frocksdb-8.10.0 can be also an imrpvement.
>
> All in all it will take some time to sort this out.
>
> [1] https://issues.apache.org/jira/browse/FLINK-37109
>
>
What would be the best approach to read a savepoint and minimise the memory
consumption. We just need to transform it into something else for
investigation.
Our flink 1.20 streaming job is using HashMap backend, and is spread over 6
task slots in 6 pods (under k8s). Savepoints are saved on S3. A s
; latest checkpoint instead of a savepoint (even though it is the latest),
> and all the checkpoints will be properly cleaned up. Did you trigger the
> savepoint periodically? And did you clean that up manually?
>
>
> Best,
> Zakelly
>
> On Thu, Jan 2, 2025 at 3:50 PM Jean-Ma
-application/checkpoints increasing? And have you set the state
>> TTL?
>>
>>
>> Best,
>> Zakelly
>>
>> On Tue, Dec 31, 2024 at 7:58 PM Jean-Marc Paulin
>> wrote:
>>
>>> Hi,
>>>
>>> We are on Flink 1.20/Java17 running in a
Hi,
We are on Flink 1.20/Java17 running in a k8s environment, with checkpoints
enabled on S3 and the following checkpoint options:
execution.checkpointing.dir: s3://flink-application/checkpoints
execution.checkpointing.externalized-checkpoint-retention:
DELETE_ON_CANCELLATION
executio
I am trying to use 3rd party serializers as per
https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/fault-tolerance/serialization/third_party_serializers/
but the code sample does not compile
```
Configuration config = new Configuration();
// register the class of the se
Hi,
We are running a Flink 1.19 application under heavy load. and out job manager
failed to restart with exceptions like
"Transient association error (association remains
live)org.apache.pekko.remote.OversizedPayloadException: Discarding oversized
payload sent to
Actor[pekko.ssl.tcp://fl...@a
Hi,
We did further test and verification on this and it seems to only be an issue
in HA with zookeeper. In that scenario I believe we resume from a checkpoint. I
suppose the solution would be to cancel the job based on Flink 1.19 and then
resubmit it with Flink 1.20.
Yes we recompiled our app
Hi, we have an error when Flink 1.20 resume a job from a savepoint that was
written with Flink 1.19 (We are in an upgrade scenario here). And we hit the
error `Caused by: java.lang.ClassNotFoundException:
org.apache.flink.runtime.jobgraph.RestoreMode`, see below
So I get that `org.apache.flink.
scenario.
But maybe there isn't any.
Best regards
JM
From: Zhanghao Chen
Sent: Tuesday, June 11, 2024 03:56
To: Jean-Marc Paulin ; user@flink.apache.org
Subject: [EXTERNAL] Re: Failed to resume from HA when the checkpoint has been
deleted.
Hi, In this case
Hi,
We have a 1.19 Flink streaming job, with HA enabled (ZooKeeper),
checkpoint/savepoint in S3. We had an outage and now the jobmanager keeps
restarting. We think it because it read the job id to be restarted from
ZooKeeper, but because we lost our S3 Storage as part of the outage it cannot
f
Hi,
We use S3 as our datastore for checkpoint/savepoints, and following an S3 error
we saw that exception:
```
java.io.IOException: GET operation failed: Could not transfer error message
at
org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:231)
at
org.apache.
___
From: Yanfei Lei
Sent: Monday, April 22, 2024 03:28
To: Jean-Marc Paulin
Cc: user@flink.apache.org
Subject: [EXTERNAL] Re: Flink 1.18: Unable to resume from a savepoint with
error InvalidPidMappingException
Hi JM,
Yes, `InvalidPidMappingException` occurs because the tra
Hi,
we use Flink 1.18 with Kafka Sink, and we enabled `EXACTLY_ONCE` on one of our
kafka sink. We set the transation timeout to 15 minutes. When we try to restore
from a savepoint, way after that 15 minutes window, Flink enter in a RESTARTING
loop. We see the error:
```
{
"exception": {
Hi,
We used to run with 3 task managers with numberOfTaskSlots = 2. So all together
we had 6 task slots and our application used them all. Trying to increase
throughput, we increased the number of task managers to 6. So now we have 12
task slots all together. However our application still only
Hi,
Trying to update the kafka connector to my project and I am missing a class. Is
the doc missing a dependency on flink-connector-base ?
org.apache.flink
flink-connector-base
compile
I added it and it works. I think that's required but I would have expected this
i
25 matches
Mail list logo