Chaining the creation of a WatermarkStrategy doesn't work?

2020-07-07 Thread Niels Basjes
Hi, I'm migrating some of my code to Flink 1.11 and I ran into something I find strange. This works WatermarkStrategy watermarkStrategy = WatermarkStrategy .forBoundedOutOfOrderness(Duration.of(1, ChronoUnit.MINUTES)); watermarkStrategy .withTimestampAssigner((SerializableTimestampAssigner)

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Benchao Li
Congratulations! Thanks Zhijiang & Piotr for the great work as release managers. Rui Li 于2020年7月8日周三 上午11:38写道: > Congratulations! Thanks Zhijiang & Piotr for the hard work. > > On Tue, Jul 7, 2020 at 10:06 PM Zhijiang > wrote: > >> The Apache Flink community is very happy to announce the rele

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Rui Li
Congratulations! Thanks Zhijiang & Piotr for the hard work. On Tue, Jul 7, 2020 at 10:06 PM Zhijiang wrote: > The Apache Flink community is very happy to announce the release of > Apache Flink 1.11.0, which is the latest major release. > > Apache Flink® is an open-source stream processing framew

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Wesley
Nice news. Congrats! Leonard Xu wrote: Congratulations! Thanks Zhijiang and Piotr for the great work, and thanks everyone involved! Best, Leonard Xu

Re: Check pointing for simple pipeline

2020-07-07 Thread Yun Tang
Hi Prasanna Using incremental checkpoint is always better than not as this is faster and less memory consumed. However, incremental checkpoint is only supported by RocksDB state-backend. Best Yun Tang From: Prasanna kumar Sent: Tuesday, July 7, 2020 20:43 To: d

Re: Re:[ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Yun Tang
Congratulations to every who involved and thanks for Zhijiang and Piotr's work as release manager. From: chaojianok Sent: Wednesday, July 8, 2020 10:51 To: Zhijiang Cc: dev ; user@flink.apache.org ; announce Subject: Re:[ANNOUNCE] Apache Flink 1.11.0 released

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Leonard Xu
Congratulations! Thanks Zhijiang and Piotr for the great work, and thanks everyone involved! Best, Leonard Xu

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Jingsong Li
Congratulations! Thanks Zhijiang and Piotr as release managers, and thanks everyone. Best, Jingsong On Wed, Jul 8, 2020 at 10:51 AM chaojianok wrote: > Congratulations! > > Very happy to make some contributions to Flink! > > > > > > At 2020-07-07 22:06:05, "Zhijiang" wrote: > > The Apache Fli

Re:[ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread chaojianok
Congratulations! Very happy to make some contributions to Flink! At 2020-07-07 22:06:05, "Zhijiang" wrote: The Apache Flink community is very happy to announce the release of Apache Flink 1.11.0, which is the latest major release. Apache Flink® is an open-source stream processin

Re: How to ensure that job is restored from savepoint when using Flink SQL

2020-07-07 Thread shadowell
Hi Fabian, Thanks for your information! Actually, I am not clear about the mechanism of auto-generated IDs in Flink SQL and the mechanism of how does the operator state mapping back from savepoint. I hope to get some detail information by giving an example bellow. I have two sql as samples:

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Yangze Guo
Thanks, Zhijiang and Piotr. Congrats to everyone involved! Best, Yangze Guo On Wed, Jul 8, 2020 at 10:19 AM Jark Wu wrote: > > Congratulations! > Thanks Zhijiang and Piotr for the great work as release manager, and thanks > everyone who makes the release possible! > > Best, > Jark > > On Wed, 8

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Jark Wu
Congratulations! Thanks Zhijiang and Piotr for the great work as release manager, and thanks everyone who makes the release possible! Best, Jark On Wed, 8 Jul 2020 at 10:12, Paul Lam wrote: > Finally! Thanks for Piotr and Zhijiang being the release managers, and > everyone that contributed to t

Re: [Third-party Tool] Flink memory calculator

2020-07-07 Thread Yangze Guo
Hi, there, As Flink 1.11.0 released, we provide a new calculator[1] for this version. Feel free to try it and any feedback or suggestion is welcomed! [1] https://github.com/KarmaGYZ/flink-memory-calculator/blob/master/calculator-1.11.sh Best, Yangze Guo On Wed, Apr 1, 2020 at 9:45 PM Yangze Gu

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Dian Fu
Thanks Piotr and Zhijiang for the great work and everyone who contributed to this release! Regards, Dian > 在 2020年7月8日,上午10:12,Paul Lam 写道: > > Finally! Thanks for Piotr and Zhijiang being the release managers, and > everyone that contributed to the release! > > Best, > Paul Lam > >> 2020年7

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Paul Lam
Finally! Thanks for Piotr and Zhijiang being the release managers, and everyone that contributed to the release! Best, Paul Lam > 2020年7月7日 22:06,Zhijiang 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.11.0, which is the latest major release. > >

Re: Manual allocation of slot usage

2020-07-07 Thread Xintong Song
Hi Mu, Regarding your questions. - The feature `spread out tasks evenly across task managers` is introduced in Flink 1.10.0, and backported to Flink 1.9.2, per the JIRA ticket [1]. That means if you configure this option in Flink 1.9.0, it should not take any effect. - Please be awa

Re: Manual allocation of slot usage

2020-07-07 Thread Yangze Guo
Hi, Mu, AFAIK, this feature is added to 1.9.2. If you use 1.9.0, would you like to upgrade your Flink distribution? Best, Yangze Guo On Tue, Jul 7, 2020 at 8:33 PM Mu Kong wrote: > > Hi, Guo, > > Thanks for helping out. > > My application has a kafka source with 60 subtasks(parallelism), and we

Re: Heterogeneous or Dynamic Stream Processing

2020-07-07 Thread Rob Shepherd
Thank you for the excellent clarifications. I couldn't quite figure out how to map the above to my domain. Nevertheless i have a working demo that performs the following pseudo code: Let's say that each "channel" has slightly different stream requirements and we can look up the list of operations

TaskManager docker image for Beam WordCount failing with ClassNotFound Exception

2020-07-07 Thread Avijit Saha
Hi, I am trying the run the Beam WordCount example on Flink runner using docker container-s for 'Jobcluster' and 'TaskManager'. When I put the Beam Wordcount custom jar in the /opt/flink/usrlib/ dir - the 'taskmanager' docker image fails at runtime with ClassNotFound Exception for the following:

Re: Heterogeneous or Dynamic Stream Processing

2020-07-07 Thread Arvid Heise
Hi Rob, 1. When you start a flink application, you actually just execute a Java main called the driver. This driver submits a job graph to the job manager, which executes the job. Since the driver is an ordinary Java program that uses the Flink API, you can compose the job graph in any way you wan

FlinkKinesisProducer blocking ?

2020-07-07 Thread Vijay Balakrishnan
Hi, current setup. Kinesis stream 1 -> Kinesis Analytics Flink -> Kinesis stream 2 | > Firehose Delivery stream Curl eror: org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.LogInputStreamReader - [2020-07-02 15:22:32.203053] [0x07f4][0x7ffbced15700] [err

Re: Heterogeneous or Dynamic Stream Processing

2020-07-07 Thread Rob Shepherd
Very helpful thank you Arvid. I've been reading up but I'm not sure I grasp all of that just yet. Please may I ask for clarification? 1. Could I summarise correctly that I may build a list of functions from an SQL call which can then be looped over? This looping sounds appealing and you are righ

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards
On Tue, Jul 7, 2020 at 10:53 AM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hey Xiaolong, > > Thanks for the suggestions. Just to make sure I understand, are you saying > to run the download and decompression in the Job Manager before executing > the job? > > I think another way to e

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards
Hey Chesnay, Thanks for the advice, and easy enough to do it in a separate process. Best, Austin On Tue, Jul 7, 2020 at 10:29 AM Chesnay Schepler wrote: > I would probably go with a separate process. > > Downloading the file could work with Flink if it is already present in > some supported fi

Re: Heterogeneous or Dynamic Stream Processing

2020-07-07 Thread Arvid Heise
Hi Rob, In the past I used a mixture of configuration and template queries to achieve a similar goal (I had only up to 150 of these jobs per application). My approach was not completely dynamic as you have described but rather to compose a big query from a configuration during the start of the app

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Chesnay Schepler
I would probably go with a separate process. Downloading the file could work with Flink if it is already present in some supported filesystem. Decompressing the file is supported for selected formats (deflate, gzip, bz2, xz), but this seems to be an undocumented feature, so I'm not sure how us

[ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Zhijiang
The Apache Flink community is very happy to announce the release of Apache Flink 1.11.0, which is the latest major release. Apache Flink(r) is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is

Re: Timeout when using RockDB to handle large state in a stream app

2020-07-07 Thread Felipe Gutierrez
I figured out that for my stream job the best was just to use the default MemoryStateBackend. I load a table from a file of 725MB in a UDF. I am also not using Flink ListState since I don't have to change the values of this table. i only do a lookup. The only thing that I need was more memory for

Heterogeneous or Dynamic Stream Processing

2020-07-07 Thread Rob Shepherd
Hi All, It'd be great to consider stream processing as a platform for our upcoming projects. Flink seems to be the closeted match. However we have numerous stream processing workloads and would want to be able to scale up to 1000's different streams; each quite similar in structure/sequence but

Check pointing for simple pipeline

2020-07-07 Thread Prasanna kumar
Hi , I have pipeline. Source-> Map(JSON transform)-> Sink.. Both source and sink are Kafka. What is the best checkpoint ing mechanism? Is setting checkpoints incremental a good option? What should be careful of? I am running it on aws emr. Will checkpoint slow the speed? Thanks, Prasanna.

Re: Manual allocation of slot usage

2020-07-07 Thread Mu Kong
Hi, Guo, Thanks for helping out. My application has a kafka source with 60 subtasks(parallelism), and we have 15 task managers with 15 slots on each. *Before I applied the cluster.evenly-spread-out-slots,* meaning it is set to default false, the operator 'kafka source" has 11 subtasks allocated

Re: Timeout when using RockDB to handle large state in a stream app

2020-07-07 Thread Yun Tang
Hi Felipe flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed cannot tell you how much memory is used by RocksDB as it mallocate memory from os directly instead from JVM. Moreover, I cannot totally understand why you ask how to increase the memory of the JM and TM when using the PredefinedOp

Re: Stateful Functions: Deploying to existing Cluster

2020-07-07 Thread Jan Brusch
Hi Igal, just as a feedback for you and anyone else reading this: Worked like a charm. Thanks again for your quick help! Best regards Jan On 06.07.20 14:02, Igal Shilman wrote: Hi Jan, Stateful functions would look at the java class path for the module.yaml, So one way would be including

Re: SSL for QueryableStateClient

2020-07-07 Thread Chesnay Schepler
Queryable state does not support SSL. On 06/07/2020 22:42, mail2so...@yahoo.co.in wrote: Hello, I am running flink on Kubernetes, and from outside the Ingress to a proxy on Kubernetes is via SSL 443 PORT only. Can you please provide guidance on how to setup the SSL for /*QueryableStateClien

Re: Manual allocation of slot usage

2020-07-07 Thread Yangze Guo
Hi, Mu, IIUC, cluster.evenly-spread-out-slots would fulfill your demand. Why do you think it does the opposite of what you want. Do you run your job in active mode? If so, cluster.evenly-spread-out-slots might not work very well because there could be insufficient task managers when request slot f

Manual allocation of slot usage

2020-07-07 Thread Mu Kong
Hi community, I'm running an application to consume data from kafka, and process it then put data to the druid. I wonder if there is a way where I can allocate the data source consuming process evenly across the task manager to maximize the usage of the network of task managers. So, for example,

Re: How to ensure that job is restored from savepoint when using Flink SQL

2020-07-07 Thread Fabian Hueske
Hi Jie Feng, As you said, Flink translates SQL queries into streaming programs with auto-generated operator IDs. In order to start a SQL query from a savepoint, the operator IDs in the savepoint must match the IDs in the newly translated program. Right now this can only be guaranteed if you transl

Any idea for data skew in hash join

2020-07-07 Thread faaron zheng
Hi, all, I use flink 1.10 to run a sql and I find that almost 60% of the data is concentrated on one parallelism. Is there any good idea for this scene?

Re: Heartbeat of TaskManager timed out.

2020-07-07 Thread Ori Popowski
I wouldn't want to jump into conclusions, but from what I see, very large lists and vectors do not work well with flatten in 2.11, each for its own reasons. In any case, it's 100% not a Flink issue. On Tue, Jul 7, 2020 at 10:10 AM Xintong Song wrote: > Thanks for the updates, Ori. > > I'm not f

Re: Heartbeat of TaskManager timed out.

2020-07-07 Thread Xintong Song
Thanks for the updates, Ori. I'm not familiar with Scala. Just curious, if what you suspect is true, is it a bug of Scala? Thank you~ Xintong Song On Tue, Jul 7, 2020 at 1:41 PM Ori Popowski wrote: > Hi, > > I just wanted to update that the problem is now solved! > > I suspect that Scala's