Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-06-18 Thread Leonard Xu
+1 to support HBase 2.2.x, and +1 to retain HBase 1.4.3 until we deprecates finished(maybe one version is enough). Currently we only support HBase 1.4.3 which is pretty old, and I’m making a flink-sql-connector-hbase[1] shaded jar for pure SQL user, the dependencies is a little more complex.

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-06-18 Thread Jark Wu
cc'ed YuLi and ZhengHu who are HBase PMC members and may know how many users are using HBase 1.4.x and 2.2.x Best, Jark On Fri, 19 Jun 2020 at 14:20, jackylau wrote: > + 1 to support HBase 2.x and the hbase 2.x client dependencies are simple > and clear. Tbe hbase project shades them all > > Ja

Re: Trouble with large state

2020-06-18 Thread Vijay Bhaskar
Thanks for the reply. I want to discuss more on points (1) and (2) If we take care of them rest will be good Coming to (1) Please try to give reasonable checkpoint interval time for every job. Minum checkpoint interval recommended by flink community is 3 minutes I thin you should give minimum 3

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-06-18 Thread jackylau
+ 1 to support HBase 2.xand the hbase 2.x client dependencies are simple and clear. Tbe hbase project shades them all Jark Wu-3 wrote > +1 to support HBase 2.xBut not sure about dropping support for 1.4.xI > cc'ed to user@ and user-zh@ to hear more feedback from users.Best,JarkOn > Thu, 18 Jun 2020

Re: Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-06-18 Thread Gyula Fóra
Hi! Having both 1.4.x and 2.x supported means we need different modules or some shim logic as they are not compatible with each other. I would love to avoid this if possible because it is a lot of extra effort from a maintainability perspective. It would be great to see how many users still use H

Re:Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-06-18 Thread chaojianok
+1 to support HBase 2.x And I think the 1.4.x version can be retained for the time being, so that users who are currently using the 1.4.x version can have more time to evaluate whether their projects need to be upgraded and the cost of upgrading. At 2020-06-19 12:35:36, "Jark Wu"

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-06-18 Thread Jark Wu
+1 to support HBase 2.x But not sure about dropping support for 1.4.x I cc'ed to user@ and user-zh@ to hear more feedback from users. Best, Jark On Thu, 18 Jun 2020 at 21:25, Gyula Fóra wrote: > Hi All! > > I would like to revive an old ticket >

Re: Native K8S not creating TMs

2020-06-18 Thread Yang Wang
Thanks for sharing the DEBUG level log. I carefully check the logs and find that the kubernetes-client discovered the api server address and token successfully. However, it could not contact with api server(10.100.0.1:443). Could you check whether you api server is configured to allow accessing w

Join tables created from Datastream whose element scala type has field Option[_]

2020-06-18 Thread YI
Hi, all I am trying to join two datastream whose element types are respectively ``` case class MyEvent( _id: Long = 0L, _cId: Long = 0L, _url: Option[String] = None, ) ``` and ``` case class MyCategory( _id: Long = 0L, _name: Option[String] = None, ) ``` When I tried to join those two tables with

Re:pyflink的table api 能否根据 filed的值进行判断插入到不同的sink表中

2020-06-18 Thread jack
测试使用如下结构: table= t_env.from_path("source") if table.filter("logType=syslog"): table.filter("logType=syslog").insert_into("sink1") elif table.filter("logType=alarm"): table.filter("logType=alarm").insert_into("sink2") 我测试了下,好像table.filter("logType=syslog").insert_into("sink1")生效,下面的elif不生效,

pyflink的table api 能否根据 filed的值进行判断插入到不同的sink表中

2020-06-18 Thread jack
使用pyflink的table api 能否根据 filed的值进行判断插入到不同的sink表中? 场景:使用pyflink通过filter进行条件过滤后插入到sink中, 比如以下两条消息,logType不同,可以使用filter接口进行过滤后插入到sink表中: { "logType":"syslog", "message":"sla;flkdsjf" } { "logType":"alarm", "message":"sla;flkdsjf" } t_env.from_path("source")\ .filter("

Re: Blink Planner Retracting Streams

2020-06-18 Thread John Mathews
Below is a basic unit test of what we are trying to achieve, but basically, we are trying to convert from a retracting stream to a RetractingStreamTableSink, which is easily done with the CRow from the original flink planner, but seems to be very difficult to do with the blink planner. The below f

Re: Kinesis ProvisionedThroughputExceededException

2020-06-18 Thread M Singh
Thanks Roman for your response.  Mans On Wednesday, June 17, 2020, 03:26:31 AM EDT, Roman Grebennikov wrote: #yiv4075825537 p.yiv4075825537MsoNormal, #yiv4075825537 p.yiv4075825537MsoNoSpacing{margin:0;}Hi, It will occur if your job will reach SHARD_GETRECORDS_RETRIES consecutive fai

Re: Completed Job List in Flink UI

2020-06-18 Thread Chesnay Schepler
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#jobstore-expiration-time On 18/06/2020 19:57, Ivan Yang wrote: Hello, In Flink web UI Overview tab, "Completed Job List” displays recent completed or cancelled job only for short period of time. After a while, they are

Re: Trouble with large state

2020-06-18 Thread Jeff Henrikson
Vijay, Thanks for your thoughts. Below are answers to your questions. > 1. What's your checkpoint interval? I have used many different checkpoint intervals, ranging from 5 minutes to never. I usually setMinPasueBetweenCheckpoints to the same value as the checkpoint interval. > 2. How freq

Re: Trouble with large state

2020-06-18 Thread Jeff Henrikson
Hi Yun, Thanks for your thoughts. Answers to your questions: > 1. "after around 50GB of state, I stop being able to reliably take > checkpoints or savepoints. " > What is the exact reason that job cannot complete checkpoint? > Expired before completing or decline by some tasks? The

Completed Job List in Flink UI

2020-06-18 Thread Ivan Yang
Hello, In Flink web UI Overview tab, "Completed Job List” displays recent completed or cancelled job only for short period of time. After a while, they are gone. The Job Manager is up and never restarted. Is there a config key to keep job history in the Completed Job List for longer time? I am

Re: what is the "Flink" recommended way of assigning a backfill to an average on an event time keyed windowed stream?

2020-06-18 Thread Marco Villalobos
I came up with a solution for backfills. However, at this moment, I am not happy with my solution. I think there might be other facilities within Flink which allow me to implement a better more efficient or more scalable solution. In another post, rmetz...@apache.org suggested that I use a proce

Re: Does anyone have an example of Bazel working with Flink?

2020-06-18 Thread Austin Cawley-Edwards
Ok, no worries Aaron, that's still good advice :) One last question - are you using JAR-based or image-based deployments? The only real problem using Flink & Bazel and a JAR-based deployment from our experience is removing the Flink libs present in the deploy environment from the deploy jar, and s

Re: Does anyone have an example of Bazel working with Flink?

2020-06-18 Thread Aaron Levin
Hi Austin, In our experience, `rules_scala` and `rules_java` are enough for us at this point. It's entirely possible I'm not thinking far enough into the future, though, so don't take our lack of investment as a sign it's not worth investing in :) Best, Aaron Levin On Thu, Jun 18, 2020 at 10:2

Re: Blink Planner Retracting Streams

2020-06-18 Thread John Mathews
So the difference between Tuple2 and CRow is that CRow has a special TypeInformation defined here: https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/types/CRowTypeInfo.scala#L32 that returns the TypeInfo of the underlying row

Re: Trouble with large state

2020-06-18 Thread Vijay Bhaskar
For me this seems to be an IO bottleneck at your task manager. I have a couple of queries: 1. What's your checkpoint interval? 2. How frequently are you updating the state into RocksDB? 3. How many task managers are you using? 4. How much data each task manager handles while taking the checkpoint?

Re: Does anyone have an example of Bazel working with Flink?

2020-06-18 Thread Austin Cawley-Edwards
Great to hear Dan! @Aaron - would you/ your team be interested in a `rules_flink` project? I'm still fairly new to Bazel and know enough to contribute, but could definitely use guidance on design as well. Best, Austin On Mon, Jun 15, 2020 at 11:07 PM Dan Hill wrote: > Thanks for the replies!

Issue with job status

2020-06-18 Thread Vijay Bhaskar
Hi I am using flink 1.9 and facing the below issue Suppose i have deployed any job and in case there are not enough slots, then the job is stuck in waiting for slots. But flink job status is showing it as "RUNNING" actually it's not. For me this is looking like a bug. It impacts our production whi

Submitted Flink Jobs EMR are failing (Could not start rest endpoint on any port in port range 8081)

2020-06-18 Thread sk_ac...@yahoo.com
I am using EMR 5.30.0 and trying to submit a Flink (1.10.0) job using the following command flink run -m yarn-cluster /home/hadoop/flink--test-0.0.1-SNAPSHOT.jar and i am getting the following error:     Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN app

Re: Trouble with large state

2020-06-18 Thread Timothy Victor
I had a similar problem. I ended up solving by not relying on checkpoints for recovery and instead re-read my input sources (in my case a kafka topic) from the earliest offset and rebuilding only the state I need. I only need to care about the past 1 to 2 days of state so can afford to drop anyt

Re: Unable to run flink job in dataproc cluster with jobmanager provided

2020-06-18 Thread Sourabh Mehta
Hi , application is using 1.10.0 but cluster is setup on 1.9.0. Yes I do have access. please find below starting logs from cluster 2020-06-17 11:28:18,989 INFO org.apache.shaded.flink.table.module.ModuleManager- Got FunctionDefinition equals from module core 2020-06-17 11:28:20,538