Re: High availability data clean up

2021-10-21 Thread Vijay Bhaskar
In HA mode the configMap will be retained after deletion of the deployment: https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/ ( Refer High availability data clean up) On Fri, Oct 22, 2021 at 8:13 AM Yangze Guo wrote: > For application mode, when the jo

Re: Why we need again kubernetes flink operator?

2021-10-21 Thread Vijay Bhaskar
Understood that we have kubernetes HA configuration where we specify s3:// or HDFS:/// persistent storage, as mentioned here: https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/ Regards Bhaskar On Fri, Oct 22, 2021 at 10:47 AM Vijay Bhaskar wrote: > All, > I

Re: Any issues with reinterpretAsKeyedStream when scaling partitions?

2021-10-21 Thread Dan Hill
Probably worth restating. I was hoping to avoid a lot of shuffles in my Flink job. A bunch of the data is already keyed by the userId (and stays keyed by it most of the Flink job). I'm not sure I understand why having two inputs is an issue in my case. Does Flink send the same keys to different

Re: Any issues with reinterpretAsKeyedStream when scaling partitions?

2021-10-21 Thread Dan Hill
I found the following link about this. Still looks applicable. In my case, I don't need to do a broadcast join. https://www.alibabacloud.com/blog/flink-how-to-optimize-sql-performance-using-multiple-input-operators_597839 On Thu, Oct 21, 2021 at 9:51 PM Dan Hill wrote: > Interesting. Thanks,

Why we need again kubernetes flink operator?

2021-10-21 Thread Vijay Bhaskar
All, I have used flink upto last year using flank 1.9. That time we built our own cluster using zookeeper and monitoring jobs. Now I am revisiting different applications. Found that community has come up with this native kubernetes deployment: https://ci.apache.org/projects/flink/flink-docs-mast

Re: Any issues with reinterpretAsKeyedStream when scaling partitions?

2021-10-21 Thread Dan Hill
Interesting. Thanks, JING ZHANG! On Mon, Oct 18, 2021 at 12:16 AM JING ZHANG wrote: > Hi Dan, > > I'm guessing I violate the "The second operator needs to be single-input > (i.e. no TwoInputOp nor union() before)" part. > I think.you are right. > > Do you want to remove shuffle of two inputs in

Re: how to delete all rows one by one in batch execution mode; shutdown cluster after all tasks finished

2021-10-21 Thread Caizhi Weng
Hi! For problem 1, Flink does not support deleting specific records. As you're running a batch job, I suggest creating a new table based on the new filter condition. Even if you can delete the old records you'll still have to generate the new ones, so why not generate them directly into a new plac

Re: High availability data clean up

2021-10-21 Thread Yangze Guo
For application mode, when the job finished normally or be canceled, the ConfigMaps will be cleanup. For session mode, when you stop the session through [1], the ConfigMaps will be cleanup. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_

Re: [ANNOUNCE] Apache Flink 1.13.3 released

2021-10-21 Thread JING ZHANG
Thank Chesnay, Martijn and every contributor for making this happen! Thomas Weise 于2021年10月22日周五 上午12:15写道: > Thanks for making the release happen! > > On Thu, Oct 21, 2021 at 5:54 AM Leonard Xu wrote: > > > > Thanks to Chesnay & Martijn and everyone who made this release happen. > > > > > > >

Re: Multiple sources ordering

2021-10-21 Thread JING ZHANG
Hi Kurt, Are you looking for Hybridsource[1], please see more information in document[1]. Hope it helps. [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/hybridsource/ Best, JING ZHANG Kurtis Walker 于2021年10月22日周五 上午9:33写道: > I have a case where my Flink jo

Multiple sources ordering

2021-10-21 Thread Kurtis Walker
I have a case where my Flink job needs to consume multiple sources. I have a topic in Kafka where the order of consuming is important. Because the cost of S3 is much less than storage on Kafka, we have a job that sinks to S3. The topic in Kafka can then retain just 3 days worth of data. My j

SplitEnumeratorContext callAsync() cleanup

2021-10-21 Thread Mason Chen
Hi all, I was wondering how to cancel a task that is enqueued by the callAsync() method, the one that takes in a time interval. For example, the KafkaSource uses this for topic partition discovery. It would be straightforward if the API returned the underlying future so that a process can cancel i

RE: Troubleshooting checkpoint timeout

2021-10-21 Thread Alexis Sarda-Espinosa
I would really appreciate more fine-grained information regarding the factors that can affect a checkpoint’s: * Sync duration * Async duration * Alignment duration * Start delay Otherwise those metrics don’t really help me know in which areas to look for issues. Regards, Alexi

Re: [ANNOUNCE] Apache Flink 1.13.3 released

2021-10-21 Thread Thomas Weise
Thanks for making the release happen! On Thu, Oct 21, 2021 at 5:54 AM Leonard Xu wrote: > > Thanks to Chesnay & Martijn and everyone who made this release happen. > > > > 在 2021年10月21日,20:08,Martijn Visser 写道: > > > > Thank you Chesnay, Leonard and all contributors! > > > > On Thu, 21 Oct 2021 a

Re: Huge backpressure when using AggregateFunction with Session Window

2021-10-21 Thread Ori Popowski
Thanks for taking the time to answer this. - You're correct that the SimpleAggregator is not used in the job setup. I didn't copy the correct piece of code. - I understand the overhead involved. But I do not agree with the O(n^2) complexity. Are you implying that Vector append is O(n)

Re: [ANNOUNCE] Apache Flink 1.13.3 released

2021-10-21 Thread Leonard Xu
Thanks to Chesnay & Martijn and everyone who made this release happen. > 在 2021年10月21日,20:08,Martijn Visser 写道: > > Thank you Chesnay, Leonard and all contributors! > > On Thu, 21 Oct 2021 at 13:40, Jingsong Li > wrote: > Thanks, Chesnay & Martijn > > 1.13.3 r

Re: Impossible to get pending file names/paths on checkpoint?

2021-10-21 Thread Fabian Paul
Hi Preston, To be honest I am a bit surprised that it worked with Flink 1.10. I was under the impression that we did never support writing to the Azure Filesystem. Unfortunately, it is very hard for us to implement a solid FileSystem implementation for Azure in Flink. We would need someone with

Re: [ANNOUNCE] Apache Flink 1.13.3 released

2021-10-21 Thread Martijn Visser
Thank you Chesnay, Leonard and all contributors! On Thu, 21 Oct 2021 at 13:40, Jingsong Li wrote: > Thanks, Chesnay & Martijn > > 1.13.3 really solves many problems. > > Best, > Jingsong > > On Thu, Oct 21, 2021 at 6:46 PM Konstantin Knauf > wrote: > > > > Thank you, Chesnay & Martijn, for mana

Re: [ANNOUNCE] Apache Flink 1.13.3 released

2021-10-21 Thread Jingsong Li
Thanks, Chesnay & Martijn 1.13.3 really solves many problems. Best, Jingsong On Thu, Oct 21, 2021 at 6:46 PM Konstantin Knauf wrote: > > Thank you, Chesnay & Martijn, for managing this release! > > On Thu, Oct 21, 2021 at 10:29 AM Chesnay Schepler > wrote: > > > The Apache Flink community is v

Re: [ANNOUNCE] Apache Flink 1.13.3 released

2021-10-21 Thread Konstantin Knauf
Thank you, Chesnay & Martijn, for managing this release! On Thu, Oct 21, 2021 at 10:29 AM Chesnay Schepler wrote: > The Apache Flink community is very happy to announce the release of > Apache Flink 1.13.3, which is the third bugfix release for the Apache > Flink 1.13 series. > > Apache Flink® i

[ANNOUNCE] Apache Flink 1.13.3 released

2021-10-21 Thread Chesnay Schepler
The Apache Flink community is very happy to announce the release of Apache Flink 1.13.3, which is the third bugfix release for the Apache Flink 1.13 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data stream

Re: Huge backpressure when using AggregateFunction with Session Window

2021-10-21 Thread Ori Popowski
I didn't try to reproduce it locally since this job reads 14K events per second. I am using Flink version 1.12.1 and RocksDB state backend. It also happens with Flink 1.10. I tried to profile with JVisualVM and I didn't see any bottleneck. All the user functions almost didn't take any CPU time. O