from:"Jean\-Marc Paulin"

BackendBuildingException when trying to restore heap backend from S3 : The specified key does not exist. (Service: Amazon S3; Status Code: 404...)

2025-05-22 Thread Jean-Marc Paulin

Hi, We are running Flink 1.20.1, and see a strange issue when trying to read a savepoint from minio/S3 to a hashmap backend. At first we'd think the file is not there, but when checking the S3 bucket the file is there. This is not systematic and only happens from time to time. We think it's an env

Is there a way to list the operators in a savepoints?

2025-04-11 Thread Jean-Marc Paulin

Hi there, I've got this odd issue with our Flink 1.20 application. We had to make a change to our savepoints structure (essentially we need to change the keys we used in the keyBy) so we wrote an application to migrate the savepoint from the oldkey to the new key. However when I run the migrated

Re: How to read a savepoint fast without exploding the memory

2025-02-17 Thread Jean-Marc Paulin

for > the whole community so if you have some info please share. > > G > > > On Wed, Feb 12, 2025 at 2:40 PM Jean-Marc Paulin > wrote: > >> Hi Gabor >> >> Glad we helped, but thanks to you for working on this. The performance >> improvement you mad

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Jean-Marc Paulin

ith relatively small states. >>> RockDb commenting out the remove() is a bit surprisingly high but since >>> it's causing correctness issues under some circumstances I would abandon >>> that. >>> >>> > I probably need more time to ensure we definite

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Jean-Marc Paulin

gt; > I probably need more time to ensure we definitely see all the keys, but > that looks very good. > > Yeah, correctness is key here so waiting on your results. > > BR, > G > > > On Wed, Feb 12, 2025 at 11:56 AM Jean-Marc Paulin wrote: > >> Hi Gabor, >

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Jean-Marc Paulin

: > Yeah, I think this is more like the 1.x and 2.x incompatibility. > I've just opened the PR agains 1.20 which you can cherry-pick here [1]. > > [1] https://github.com/apache/flink/pull/26145 > > BR, > G > > > On Tue, Feb 11, 2025 at 4:19 PM Jean-Marc Paulin wrote

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Jean-Marc Paulin

: > Yeah, I think this is more like the 1.x and 2.x incompatibility. > I've just opened the PR agains 1.20 which you can cherry-pick here [1]. > > [1] https://github.com/apache/flink/pull/26145 > > BR, > G > > > On Tue, Feb 11, 2025 at 4:19 PM Jean-Marc Paulin wr

Re: How to read a savepoint fast without exploding the memory

2025-02-11 Thread Jean-Marc Paulin

xecution time: 25M27.126737S >>> - New execution time: 1M19.602042S >>> In short: ~95% performance gain. >>> >>> G >>> >>> >>> On Thu, Feb 6, 2025 at 9:06 AM Gabor Somogyi >>> wrote: >>> >>>> In short, when yo

Re: How to read a savepoint fast without exploding the memory

2025-02-11 Thread Jean-Marc Paulin

:06 AM Gabor Somogyi >> wrote: >> >>> In short, when you don't care about >>> multiple KeyedStateReaderFunction.readKey calls then you're on the safe >>> side. >>> >>> G >>> >>> On Wed, Feb 5, 2025 at 6:27 P

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Jean-Marc Paulin

s in this area. It's still a question whether the >>> tested patch is the right approach >>> but at least we've touched the root cause. >>> >>> The next step on my side is to have a deep dive and understand all the >>> aspects why remove is t

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Jean-Marc Paulin

t;> 2025-02-04 13:39:14,704 INFO o.a.f.e.s.r.FlinkTestStateReaderJob >> [] - Execution time: PT6.930659S >> >> Don't need to mention that the bigger is the processor API. >> >> G >> >> >> On Tue, Feb 4, 2025 at 1:40 PM Jean-Marc

Re: How to read a savepoint fast without exploding the memory

2025-02-04 Thread Jean-Marc Paulin

on a custom written operator, which opens the state > info for you. > > The only drawback is that you must know the keyBy range... this can be > problematic but if you can do it it's a win :) > > G > > > On Tue, Feb 4, 2025 at 12:16 PM Jean-Marc Paulin wrote: > >

Re: How to read a savepoint fast without exploding the memory

2025-02-04 Thread Jean-Marc Paulin

h > and intended to eliminate the slowness. Since we don't know what exactly > causes the slowness the new Frocksdb-8.10.0 can be also an imrpvement. > > All in all it will take some time to sort this out. > > [1] https://issues.apache.org/jira/browse/FLINK-37109 > >

How to read a savepoint fast without exploding the memory

2025-02-04 Thread Jean-Marc Paulin

What would be the best approach to read a savepoint and minimise the memory consumption. We just need to transform it into something else for investigation. Our flink 1.20 streaming job is using HashMap backend, and is spread over 6 task slots in 6 pods (under k8s). Savepoints are saved on S3. A s

Re: Q: How to best configure checkpoint to ensure they do not fill-up the storage?

2025-01-03 Thread Jean-Marc Paulin

; latest checkpoint instead of a savepoint (even though it is the latest), > and all the checkpoints will be properly cleaned up. Did you trigger the > savepoint periodically? And did you clean that up manually? > > > Best, > Zakelly > > On Thu, Jan 2, 2025 at 3:50 PM Jean-Ma

Re: Q: How to best configure checkpoint to ensure they do not fill-up the storage?

2025-01-01 Thread Jean-Marc Paulin

-application/checkpoints increasing? And have you set the state >> TTL? >> >> >> Best, >> Zakelly >> >> On Tue, Dec 31, 2024 at 7:58 PM Jean-Marc Paulin >> wrote: >> >>> Hi, >>> >>> We are on Flink 1.20/Java17 running in a

Q: How to best configure checkpoint to ensure they do not fill-up the storage?

2024-12-31 Thread Jean-Marc Paulin

Hi, We are on Flink 1.20/Java17 running in a k8s environment, with checkpoints enabled on S3 and the following checkpoint options: execution.checkpointing.dir: s3://flink-application/checkpoints execution.checkpointing.externalized-checkpoint-retention: DELETE_ON_CANCELLATION executio

How to programatically register a 3rd party serializer (Flink 1.20)

2024-12-05 Thread Jean-Marc Paulin

I am trying to use 3rd party serializers as per https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/fault-tolerance/serialization/third_party_serializers/ but the code sample does not compile ``` Configuration config = new Configuration(); // register the class of the se

Meaning behind the pekko.framesize configuration property and how to reduce it's size

2024-10-03 Thread Jean-Marc Paulin

Hi, We are running a Flink 1.19 application under heavy load. and out job manager failed to restart with exceptions like "Transient association error (association remains live)org.apache.pekko.remote.OversizedPayloadException: Discarding oversized payload sent to Actor[pekko.ssl.tcp://fl...@a

Error class RestoreMode not found after upgrading from Flink 1.19 to 1.20

2024-08-23 Thread Jean-Marc Paulin

Hi, We did further test and verification on this and it seems to only be an issue in HA with zookeeper. In that scenario I believe we resume from a checkpoint. I suppose the solution would be to cancel the job based on Flink 1.19 and then resubmit it with Flink 1.20. Yes we recompiled our app

Error class RestoreMode not found after upgrading from Flink 1.19 to 1.20

2024-08-22 Thread Jean-Marc Paulin

Hi, we have an error when Flink 1.20 resume a job from a savepoint that was written with Flink 1.19 (We are in an upgrade scenario here). And we hit the error `Caused by: java.lang.ClassNotFoundException: org.apache.flink.runtime.jobgraph.RestoreMode`, see below So I get that `org.apache.flink.

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-11 Thread Jean-Marc Paulin

scenario. But maybe there isn't any. Best regards JM From: Zhanghao Chen Sent: Tuesday, June 11, 2024 03:56 To: Jean-Marc Paulin ; user@flink.apache.org Subject: [EXTERNAL] Re: Failed to resume from HA when the checkpoint has been deleted. Hi, In this case

Failed to resume from HA when the checkpoint has been deleted.

2024-06-10 Thread Jean-Marc Paulin

Hi, We have a 1.19 Flink streaming job, with HA enabled (ZooKeeper), checkpoint/savepoint in S3. We had an outage and now the jobmanager keeps restarting. We think it because it read the job id to be restarted from ZooKeeper, but because we lost our S3 Storage as part of the outage it cannot f

Saw a java.lang.ClassNotFoundException: com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException

2024-05-09 Thread Jean-Marc Paulin

Hi, We use S3 as our datastore for checkpoint/savepoints, and following an S3 error we saw that exception: ``` java.io.IOException: GET operation failed: Could not transfer error message at org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:231) at org.apache.

RE: Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException

2024-04-23 Thread Jean-Marc Paulin

___ From: Yanfei Lei Sent: Monday, April 22, 2024 03:28 To: Jean-Marc Paulin Cc: user@flink.apache.org Subject: [EXTERNAL] Re: Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException Hi JM, Yes, `InvalidPidMappingException` occurs because the tra

Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException

2024-04-19 Thread Jean-Marc Paulin

Hi, we use Flink 1.18 with Kafka Sink, and we enabled `EXACTLY_ONCE` on one of our kafka sink. We set the transation timeout to 15 minutes. When we try to restore from a savepoint, way after that 15 minutes window, Flink enter in a RESTARTING loop. We see the error: ``` { "exception": {

Q: Not all the task slots are used. Are we missing a setting somewhere?

2024-02-23 Thread Jean-Marc Paulin

Hi, We used to run with 3 task managers with numberOfTaskSlots = 2. So all together we had 6 task slots and our application used them all. Trying to increase throughput, we increased the number of task managers to 6. So now we have 12 task slots all together. However our application still only

Is the kafka-connector doc missing a dependency on flink-connector-base

2023-12-04 Thread Jean-Marc Paulin

Hi, Trying to update the kafka connector to my project and I am missing a class. Is the doc missing a dependency on flink-connector-base ? org.apache.flink flink-connector-base compile I added it and it works. I think that's required but I would have expected this i

BackendBuildingException when trying to restore heap backend from S3 : The specified key does not exist. (Service: Amazon S3; Status Code: 404...)

Is there a way to list the operators in a savepoints?

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

Re: How to read a savepoint fast without exploding the memory

How to read a savepoint fast without exploding the memory

Re: Q: How to best configure checkpoint to ensure they do not fill-up the storage?

Re: Q: How to best configure checkpoint to ensure they do not fill-up the storage?

Q: How to best configure checkpoint to ensure they do not fill-up the storage?

How to programatically register a 3rd party serializer (Flink 1.20)

Meaning behind the pekko.framesize configuration property and how to reduce it's size

Error class RestoreMode not found after upgrading from Flink 1.19 to 1.20

Error class RestoreMode not found after upgrading from Flink 1.19 to 1.20

Re: Failed to resume from HA when the checkpoint has been deleted.

Failed to resume from HA when the checkpoint has been deleted.

Saw a java.lang.ClassNotFoundException: com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException

RE: Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException

Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException

Q: Not all the task slots are used. Are we missing a setting somewhere?

Is the kafka-connector doc missing a dependency on flink-connector-base

28 matches

Site Navigation

Mail list logo

Footer information