Re: OperatorStateFromBackend can't complete initialisation because of high number of savepoint files reads

2024-10-17 Thread Mate Czagany
Hi William, I think your findings are correct, I could easily reproduce the issue with snapshot-compression set to false, but I was unable to with snapshot-compression set to true. When using compressed state, the available() call will return the number of bytes in the Snappy internal buffer that

Re: Stopping the flink 1.18 program with savepoint seems to fail with timeout

2024-10-11 Thread Mate Czagany
Hi, In the background it is a REST call to Flink. If it takes too long to create the savepoint, you might hit a timeout. You can increase this using the configuration client.timeout [1]. You can also use the --detached option for the stop action, which will return once it receives a trigger ID fro

Re: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread Mate Czagany
Oops, forgot the links, sorry about that [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker [2] https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-python-example/Dockerfile Mate

Re: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread Mate Czagany
Hi, You can refer to the example Dockerfile in the Flink docs [1] and you can also take a look at the example found in the Flink Kubernetes Operator repo [2]. The second Dockerfile won't work because it is missing all Flink libraries if I am not mistaken. Regards, Mate ezt írta (időpont: 2024.

Re: Flink Kubernetes Operator - How can I use a jar that is hosted on a private maven repo for a FlinkSessionJob?

2024-05-12 Thread Mate Czagany
-27483 Regards, Mate Czagany Nathan T. A. Lewis ezt írta (időpont: 2024. máj. 9., Cs, 19:00): > Hello, > > I am trying to run a Flink Session Job with a jar that is hosted on a > maven repository in Google's Artifact Registry. > > The first thing I tried was to just specify

Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Mate Czagany
Hi, I have opened a JIRA [1] as I had the same error (AlreadyExists) last week and I could pinpoint the problem to the TaskManagers being still alive when creating the new Deployment. In native mode we only check for the JobManagers when we wait for the cluster to shut down in contrast to standalo

Re: changing the 'flink-main-container' name

2023-10-20 Thread Mate Czagany
Hi, By naming the container flink-main-container, Flink will know which container spec it should use for the Flink containers. If you change the name Flink won't know which container spec to use for the Flink container, and will probably think it's just a sidecar container, and there will still be

Re: Bloom Filter for Rocksdb

2023-10-20 Thread Mate Czagany
Hi, There have been no reports about setting this configuration causing any issues. I would guess it's off by default because it can increase the memory usage by an unpredictable amount. I would say feel free to enable it, from what you've said I also think that this would improve the performance

Re: Flink HDFS with Flink Kubernetes Operator

2023-10-19 Thread Mate Czagany
Hello, Please look into using 'kubernetes.decorator.hadoop-conf-mount.enabled' [1] that was added for use cases where the user wishes to skip adding these Hadoop mount decorators. It's true by default, but by setting it to false Flink won't add this mount. [1] https://nightlies.apache.org/flink/f

Re: Apache Atlas - Flink Integration

2023-08-01 Thread Mate Czagany
Hi, Unfortunately the Atlas hook you've read about is only available in the Cloudera Flink solution and has not been made open-source. In the future FLIP-314[1] might offer a simple solution to implement the Atlas integration. Best Regards, Mate [1] https://cwiki.apache.org/confluence/display/F

Re: Questions on S3 File Sink Behavior

2023-03-29 Thread Mate Czagany
Hi, 1. In case of S3 FileSystem, Flink uses the multipart upload process [1] for better performance. It might not be obvious at first by looking at the docs, but it's noted at the bottom of the FileSystem page [2] For more information you can also check FLINK-9751 and FLINK-9752 2. In case of loc

Re: Table API function and expression vs SQL

2023-03-25 Thread Mate Czagany
Hi, Please also keep in mind that restoring existing Table API jobs from savepoints when upgrading to a newer minor version of Flink, e.g. 1.16 -> 1.17 is not supported as the topology might change between these versions due to optimizer changes. See here for more information: https://nightlies.a