Re: In consistent Check point API response

2020-05-26 Thread Yun Tang
Hi Bhaskar I think I have understood your scenario now. And I think this is what expected in Flink. As you only allow your job could restore 5 times, the "restore" would only record the checkpoint to restore at the 5th recovery, and the checkpoint id would always stay there. "Restored" is for

Re: java.lang.AbstractMethodError when implementing KafkaSerializationSchema

2020-05-26 Thread Aljoscha Krettek
I think what might be happening is that you're mixing dependencies from the flink-sql-connector-kafka and the proper flink-connector-kafka that should be used with the DataStream API. Could that be the case? Best, Aljoscha On 25.05.20 19:18, Piotr Nowojski wrote: Hi, It would be helpful if y

Re: In consistent Check point API response

2020-05-26 Thread Vijay Bhaskar
Thanks Yun. How can i contribute better documentation of the same by opening Jira on this? Regards Bhaskar On Tue, May 26, 2020 at 12:32 PM Yun Tang wrote: > Hi Bhaskar > > I think I have understood your scenario now. And I think this is what > expected in Flink. > As you only allow your job co

Question on Job Restart strategy

2020-05-26 Thread Vijay Bhaskar
Hi We are using restart strategy of fixed delay. I have fundamental question: Why the reset counter is not zero after streaming job restart is successful? Let's say I have number of restarts max are: 5 My streaming job tried 2 times and 3'rd attempt its successful, why counter is still 2 but not ze

Re: java.lang.AbstractMethodError when implementing KafkaSerializationSchema

2020-05-26 Thread Leonard Xu
Hi,wanglei I think Aljoscha is wright. Could you post your denpendency list? Dependency flink-connector-kafka is used in dataStream Application which you should use, dependency flink-sql-connector-kafka is used in Table API & SQL Application. We should only add one of them because the two depend

Re: Testing process functions

2020-05-26 Thread Manas Kale
Thank you for the example, Alexander. On Wed, May 20, 2020 at 6:48 PM Alexander Fedulov wrote: > Hi Manas, > > I would recommend using TestHarnesses for testing. You could also use them > prior to 1.10. Here is an example of setting the dependencies: > > https://github.com/afedulov/fraud-detecti

Re: How can I set the parallelism higher than the task slot number in more machines?

2020-05-26 Thread Till Rohrmann
Hi Felipe, Flink does not create dummy operators. Unless you have configured one operator to have a parallelism of 32, you should actually only see 16 subtasks of a given operator (given that you start your program with -p 16). Be aware, though, that if you have multiple operators which cannot sha

Re: close file on job crash

2020-05-26 Thread Laurent Exsteens
Thanks! On Tue, May 26, 2020, 08:13 Piotr Nowojski wrote: > Hi, > > One clarification. `RichFunction#close` is of course called always, not > only after internal failure. It’s called after internal failure, external > failure or clean shutdown. > > `SourceFunction#cancel` is intended to inform t

Modified & rebuilt Flink source code but changes do not work

2020-05-26 Thread Qi K.
Hi folks, Within our team, we made some simple changes to the source code of flink-runtime module (mostly related to log levels, like INFO -> WARN). Then we rebuilt the whole Flink project using `mvn clean install -DskipTests` command (Flink version = 1.9.3, Maven version = 3.2.5), the proce

Re: Using Queryable State within 1 job + docs suggestion

2020-05-26 Thread Annemarie Burger
Hi, I managed to work around the JobID issues, by first starting the task that queries the state, pauzing it, and then using env.executeAsync.getJobID to get the proper jobID to use when querying the state, and passing that to the (pauzed) query state task, which can then continue. However, the Q

Re: Modified & rebuilt Flink source code but changes do not work

2020-05-26 Thread Congxian Qiu
Hi If you commit the change in you local git repo, could you please check whether the commitid in job log(such as `Rev:28bdd33`, the 28bdd33 is the commit id) is the same as the local commit id? Best, Congxian Qi K. 于2020年5月26日周二 下午4:47写道: > Hi folks, > > Within our team, we made some simple

Re: Multiple Sinks for a Single Soure

2020-05-26 Thread Piotr Nowojski
Hi, You could easily filter/map/process the streams differently before writing them to the sinks. Building on top of my previous example, this also should work fine: DataStream myStream = env.addSource(…).foo().bar() // for custom source, but any ; myStream.baz().addSink(sink1); myStream.add

Re: Stateful-fun-Basic-Hello

2020-05-26 Thread Igal Shilman
Hi, Can you verify that your jar contains the following file META-INF/services/org.apache.flink.statefun.sdk.spi.StatefulFunctionModule ? Thanks, Igal. On Tue, May 26, 2020 at 11:49 AM C DINESH wrote: > Hi Gordon, > > Thanks for your response. > > After adding this conf to flink-yml. > > `class

Re: Multiple Sinks for a Single Soure

2020-05-26 Thread Prasanna kumar
Thanks Piotr for the Reply. I will explain my requirement in detail. Table Updates -> Generate Business Events -> Subscribers *Source Side* There are CDC of 100 tables which the framework needs to listen to. *Event Table Mapping* There would be Event associated with table in a *m:n* fashion.

Re: Apache Flink - Question about application restart

2020-05-26 Thread M Singh
Hi Zhu Zhu: I have another clafication - it looks like if I run the same app multiple times - it's job id changes.  So it looks like even though the graph is the same the job id is not dependent on the job graph only since with different runs of the same app it is not the same. Please let me kn

ClusterClientFactory selection

2020-05-26 Thread M Singh
Hi: I wanted to find out which parameter/configuration allows flink cli pick up the appropriate cluster client factory (especially in the yarn mode). Thanks

Re: Question on Job Restart strategy

2020-05-26 Thread Gary Yao
Hi Bhaskar, > Why the reset counter is not zero after streaming job restart is successful? The short answer is that the fixed delay restart strategy is not implemented like that (see [1] if you are using Flink 1.10 or above). There are also other systems that behave similarly, e.g., Apache Hadoop

Re: Question about My Flink Application

2020-05-26 Thread Sara Arshad
Hi Alexander, Thank you for your reply. I got a reply from AWS people. Seems like it's a configuration problem. But, even if it works fine without restarting, it's not a good option for us. There is no one-to-one relation between cache data and keyed values. Therefore, It has to send the whole dat

Re: Multiple Sinks for a Single Soure

2020-05-26 Thread Piotr Nowojski
Hi, I’m not sure if I fully understand what do you mean by > The point is the sink are not predefined. You must know before submitting the job, what sinks are going to be used in the job. You can have some custom logic, that would filter out records before writing them to the sinks, as I propo

Re: Multiple Sinks for a Single Soure

2020-05-26 Thread Prasanna kumar
Piotr, There is an event and subscriber registry as JSON file which has the table event mapping and event-subscriber mapping as mentioned below. Based on the set JSON , we need to job to go through the table updates and create events and for each event there is a way set how to sink them. The si

Re: RocksDB savepoint recovery performance improvements

2020-05-26 Thread Joey Pereira
Following up: I've put together the implementation, https://github.com/apache/flink/pull/12345. It's passing tests but is only partially complete, as it still needs some clean-up and configuration. I still need to try running this against a production cluster to check the performance, as well as ge

Webinar: Unlocking the Power of Apache Beam with Apache Flink

2020-05-26 Thread Aizhamal Nurmamat kyzy
Hi all, Please join our webinar this Wednesday at 10am PST/5:00pm GMT/1:00pm EST where Max Michels - PMC member for Apache Beam and Apache Flink, will deliver a talk about leveraging Apache Beam for large-scale stream and batch analytics with Apache Flink. You can register via this link: https://

Re: RocksDB savepoint recovery performance improvements

2020-05-26 Thread Steven Wu
Yun, you mentioned that checkpoint also supports rescale. I thought the recommendation [1] is to use savepoint for rescale. [1] https://www.ververica.com/blog/differences-between-savepoints-and-checkpoints-in-flink On Tue, May 26, 2020 at 6:46 AM Joey Pereira wrote: > Following up: I've put tog

Tumbling windows - increasing checkpoint size over time

2020-05-26 Thread Wissman, Matt
Hello Flink Community, I’m running a Flink pipeline that uses a tumbling window and incremental checkpoint with RocksDB backed by s3. The number of objects in the window is stable but overtime the checkpoint size grows seemingly unbounded. Within the first few hours after bringing the Flink pip

Re: Tumbling windows - increasing checkpoint size over time

2020-05-26 Thread Guowei Ma
Hi, Matt The total size of the state of the window operator is related to the number of windows. For example if you use keyby+tumblingwindow there would be keys number of windows. Hope this helps. Best, Guowei Wissman, Matt 于2020年5月27日周三 上午3:35写道: > > Hello Flink Community, > > > > I’m running a

Re: ClusterClientFactory selection

2020-05-26 Thread Yang Wang
Hi M Singh, The Flink CLI picks up the correct ClusterClientFactory via java SPI. You could check YarnClusterClientFactory#isCompatibleWith for how it is activated. The cli option / configuration is "-e/--executor" or execution.target (e.g. yarn-per-job)*.* Best, Yang M Singh 于2020年5月26日周二 下午6

Re: Flink Dashboard UI Tasks hard limit

2020-05-26 Thread Xintong Song
Could you also explain how do you set the parallelism when getting this execution plan? I'm asking because this json file itself only shows the resulted execution plan. It is not clear to me what is not working as expected in your case. E.g., you set the parallelism for an operator to 10 but the ex

Re: RocksDB savepoint recovery performance improvements

2020-05-26 Thread Yun Tang
@Joey Pereira I think you might need to create a new JIRA ticket and link your PR to the new issue as FLINK-17288 mainly focus on bulk load options while your solution focus on SST generator, if your solution could behav

Re: In consistent Check point API response

2020-05-26 Thread Yun Tang
To be honest, from my point of view current description should have already give enough explanations [1] in "Overview Tab". Latest Completed Checkpoint: The latest successfully completed checkpoints. Latest Restore: There are two types of restore operations. * Restore from Checkpoint: