Re: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
I think I managed to create a reproducible example [1], I think it's due to the use of window + join + window. When I run the test, I never see the print output, but if I uncomment part of the code in the watermark generator to mark it as idle more quickly, it starts working after a while. [1]

Re: Flink Statefun and Feature computation

2022-03-10 Thread Federico D'Ambrosio
Hi Igal, thank you so much for your response. As for [2], I was mainly interested in how the state is stored physically. Looking at the deployment files, I see the following file https://github.com/apache/flink-statefun-playground/blob/main/deployments/k8s/04-statefun/01-statefun-runtime.yaml

Flink argo native k8s application mode

2022-03-10 Thread Eric Berryman
Hello, I'm attempting a Flink Argo native k8s application mode setup. Can someone point me to some documentation on the subject of CD with this Flink setup? Thank you! Eric

RE: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
I found [1] and [2], which are closed, but could be related? [1] https://issues.apache.org/jira/browse/FLINK-23698 [2] https://issues.apache.org/jira/browse/FLINK-18934 Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Donnerstag, 10. März 2022 19:27 To: user@flink.apache.org Subject: Interval

Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
Hello, I'm in the process of updating from Flink 1.11.3 to 1.14.3, and it seems the interval join in my pipeline is no longer working. More specifically, I have a sliding window after the interval join, and the window isn't firing. After many tests, I ended up creating a custom operator that ex

Re: Could not stop job with a savepoint

2022-03-10 Thread Vinicius Peracini
Hi Schwalbe! Yes, I'm using RocksDBStateBackend. I guess your suspicion was right, I changed the memory allocator to jemalloc and the issue seems to be gone. Here is what I did to change the memory allocator on EMR: 1. Installed the jemalloc package by using an EMR bootstrap action script: sudo

Re: Using another FileSystem configuration while creating a job

2022-03-10 Thread Gil De Grove
Hey Chesnay, Thanks for the thorough answer, much appreciated. Sorry for the "requesting []...", it was a loss in translation, that passed my second reading check, the correct verb should have been "asking" :). It was no request to the community at all, sorry again for that. The solution to imple

Re: Using another FileSystem configuration while creating a job

2022-03-10 Thread Chesnay Schepler
> if we want to use two sets of credentials, for example to access two different AWS buckets, that would not be feasible at the moment? That is correct. > As it seems that this limitation is quite an important one, is there a place where we can find this documented? I don't think it is expli

RE: Savepoint API challenged with large savepoints

2022-03-10 Thread Schwalbe Matthias
Thanks, Chesnay, I just created the 3 tickets (in my clumsy way): * FLINK-26584 : State Processor API fails to write savepoints exceeding 5MB * FLINK-26585 : State Processor API: Loadin

Re: Using another FileSystem configuration while creating a job

2022-03-10 Thread Gil De Grove
Hello Chesnay, Thanks for the reply. I wonder something based on your reply, if we want to use two sets of credentials, for example to access two different AWS buckets, that would not be feasible at the moment? One example I have in mind would be to separate the credentials for accessing data vs

Re: Fwd: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-10 Thread Jark Wu
Hi Francesco, Yes. The Hive syntax is a syntax plugin provided by Hive connector. > But right now I don't think It's a good idea adding new features on top, as it will create only more maintenance burden both for Hive developers and for table developers. We are not adding new Hive features, but

Re: Customizing backpressure mechanism for RichParallelSourceFunction

2022-03-10 Thread Chesnay Schepler
It's not possible to send events to sources; data only flows in 1 direction. On 03/03/2022 06:31, Le Xu wrote: Hello! I have a dataflow pipeline built using Flink's RichParallelSourceFunction as parallel sources. I'm wondering if there are any mechanisms that I could use to implement *ack-bas

Re: Using another FileSystem configuration while creating a job

2022-03-10 Thread Chesnay Schepler
The FileSystem class is essentially one big singleton, with only 1 instance of each FileSystem implementation being loaded, shared across all jobs. For that reason we do not support job-specific FileSystem configurations. Note that we generally also don't really support configuring the FileSyst

Re: Savepoint API challenged with large savepoints

2022-03-10 Thread Chesnay Schepler
That all sounds very interesting; I'd go ahead with creating tickets. On 08/03/2022 13:43, Schwalbe Matthias wrote: Dear Flink Team, In the last weeks I was faced with a large savepoint (around 40GiB) that contained lots of obsolete data points and overwhelmed our infrastructure (i.e. failed

Re: Problem about adding custom kryo serializer

2022-03-10 Thread Chesnay Schepler
Sounds correct to me; it's not a POJO so it is treated as a generic type, which go through Kryo. If you want to be doubly-sure that your serializer is in fact used, add a log statement to the read/write methods. On 09/03/2022 08:10, guoliubi...@foxmail.com wrote: Hi, I have an entity class

Re: Fwd: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-10 Thread Francesco Guardiani
> We still need some work to make the Hive dialect purely rely on public APIs, and the Hive connector should be decopule with table planner. >From the table perspective, I think this is the big pain point at the moment. First of all, when we talk about the Hive syntax, we're really talking about t

Re: Fwd: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-10 Thread Martijn Visser
Thank you Yuxia for volunteering, that's really much appreciated. It would be great if you can create an umbrella ticket for that. It would be great to get some insights from currently Flink and Hive users which versions are being used. @Jark I would indeed deprecate the old Hive versions in Flink