State Processor API to boot strap keyed state for Stream Application.

2020-08-07 Thread Marco Villalobos
I have read the documentation and various blogs that state that it is possible to load data into a data-set and use that data to bootstrap a stream application. The documentation literally says this, "...you can read a batch of data from any store, preprocess it, and write the result to a savepoin

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Felipe Lolas
Hi all! Im new here; I have been using the flink connector for hbase 1.2, but recently opt to upgrading to hbase 2.1(basically because was bundled in CDH6) it would be nice to add support for hbase 2.x! I found that supporting hbase 1.4.3 and 2.1 needs minimal changes and keeping that in mind

Re: GroupBy with count on a joint table only let met write using toRetractStream

2020-08-07 Thread Faye Pressly
Sorry just notice I made a typo in the last table (clickAdvertId != null instead of clickCount !=null) Table allImpressionTable = impressionsTable .leftOuterJoin(clicksTable, "clickImpId = impImpressionId && clickMinute = impMinute") .groupBy("impAdvertId, impVariationName, impMinute

GroupBy with count on a joint table only let met write using toRetractStream

2020-08-07 Thread Faye Pressly
Hello, I have a steam of events (coming from a Kinesis Stream) of this form: impressionId | advertid | variationName | eventType | eventTime The end goal is to output back on a Kinesis Stream the count of event of type 'impression' and the count of events of type 'click' however, I need to dro

Native K8S Jobmanager restarts and job never recovers

2020-08-07 Thread Bohinski, Kevin
Hi all, In our 1.11.1 native k8s session after we submit a job it will run successfully for a few hours then fail when the jobmanager pod restarts. Relevant logs after restart are attached below. Any suggestions? Best kevin 2020-08-06 21:50:24,425 INFO org.apache.flink.kubernetes.KubernetesR

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Jark Wu
I'm +1 to add HBase 2.x However, I have some concerns about moving HBase 1.x to Bahir: 1) As discussed above, there are still lots of people using HBase 1.x. 2) Bahir doesn't have the infrastructure to run the existing HBase E2E tests. 3) We also paid lots of effort to provide an uber connector ja

Re: Submit Flink 1.11 job from java

2020-08-07 Thread David Anderson
Flavio, Have you looked at application mode [1] [2] [3], added in 1.11? It offers at least some of what you are looking for -- the application jar and its dependencies can be pre-uploaded to HDFS, and the main() method runs on the job manager, so none of the classes have to be loaded in the client

Flink job percentage

2020-08-07 Thread Flavio Pompermaier
Hi to all, one of our customers asked us to see a percentage of completion of a Flink Batch job. Is there any already implemented heuristic I can use to compute it? Will this be possible also when DataSet api will migrate to DataStream..? Thanks in advance, Flavio

Re: Only One TaskManager Showing High CPU Usage

2020-08-07 Thread Jake
Hi Mason Can you use the jvm cpu perfrommance analysis tools? Jprofile and https://github.com/alibaba/arthas You can probably guess the reason for the high CPU load. Jake > On Aug 6, 2020, at 12:25 PM, Chen, Mason wrote: > > Thanks Peter for the reply.

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Robert Metzger
Hi, Thank you for picking this up so quickly. I have no objections regarding all the proposed items. @Gyula: Once the bahir contribution is properly reviewed, ping me if you need somebody to merge it. On Fri, Aug 7, 2020 at 10:43 AM Márton Balassi wrote: > Hi Robert and Gyula, > > Thanks for r

Flink maxrecordcount increase causing a few task manager throughput drops

2020-08-07 Thread Terry Chia-Wei Wu
hi, I change the following config from flink.shard.getrecords.maxrecordcount: 1000 flink.shard.getrecords.intervalmillis: 200 to flink.shard.getrecords.maxrecordcount: 1 flink.shard.getrecords.intervalmillis: 1000 and found a few task managers around (10/1000) are becoming very slow. We als

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Márton Balassi
Hi Robert and Gyula, Thanks for reviving this thread. We have the implementation (currently for 2.2.3) and it is straightforward to contribute it back. Miklos (ccd) has recently written a readme for said version, he would be interested in contributing the upgraded connector back. The latest HBase

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Gyula Fóra
Hi Robert, I completely agree with you on the Bahir based approach. I am happy to help with the contribution on the bahir side, with thorough review and testing. Cheers, Gyula On Fri, 7 Aug 2020 at 09:30, Robert Metzger wrote: > It seems that this thead is not on dev@ anymore. Adding it back

Re: Submit Flink 1.11 job from java

2020-08-07 Thread Flavio Pompermaier
The problem with env.executeAsync is that I need to load the job classes on the client side and this is something I'd like to avoid because it's a source of problems. I'd like to tell Flink to run a jar that is available somewhere (on the flink instances or on the blob server or on a network filesy

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Robert Metzger
It seems that this thead is not on dev@ anymore. Adding it back ... On Fri, Aug 7, 2020 at 9:23 AM Robert Metzger wrote: > I would like to revive this discussion. There's a new JIRA[1] + PR[2] for > adding HBase 2 support. > > it seems that there is demand for a HBase 2 connector, and consensus

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Robert Metzger
I would like to revive this discussion. There's a new JIRA[1] + PR[2] for adding HBase 2 support. it seems that there is demand for a HBase 2 connector, and consensus to do it. The remaining question in this thread seems to be the "how". I would propose to go the other way around as Gyula suggest