Re: EventTimeSessionWindow firing too soon

2020-06-15 Thread Ori Popowski
So why is it happening? I have no clue at the moment. My event-time timestamps also do not have big gaps between them that would explain the window triggering. On Mon, Jun 15, 2020 at 9:21 PM Robert Metzger wrote: > If you are using event time in Flink, it is disconnected from the real > world

pyflink连接elasticsearch5.4问题

2020-06-15 Thread jack
我这边使用的是pyflink连接es的一个例子,我这边使用的es为5.4.1的版本,pyflink为1.10.1,连接jar包我使用的是 flink-sql-connector-elasticsearch5_2.11-1.10.1.jar,kafka,json的连接包也下载了,连接kafka测试成功了。 连接es的时候报错,findAndCreateTableSink failed。 是不是es的连接jar包原因造成的?哪位遇到过类似问题还请指导一下,感谢。 Caused by Could not find a suitable factory for ‘org.apac

Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

2020-06-15 Thread Marco Villalobos
Does Flink support reading files or CSV files from java.io.InputStream instead of file paths? I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String). If there is a way, I'd appreciate an example.

Re: Native K8S not creating TMs

2020-06-15 Thread Yang Wang
Hi Kevin, Sorry for not notice your last response. Could you share you full DEBUG level jobmanager logs? I will try to figure out whether it is a issue of Flink or K8s. Because i could not reproduce your situation with my local K8s cluster. Best, Yang Yang Wang 于2020年6月8日周一 上午11:02写道: > Hi Ke

Re: pyflink数据查询

2020-06-15 Thread jack
hi 感谢您的建议,我这边尝试一下自定义实现sink的方式。 Best, Jack 在 2020-06-15 18:08:15,"godfrey he" 写道: hi jack,jincheng Flink 1.11 支持直接将select的结果collect到本地,例如: CloseableIterator it = tEnv.executeSql("select ...").collect(); while(it.hasNext()) { it.next() } 但是 pyflink 还没有引入 collect() 接口。(后续

Re: Does anyone have an example of Bazel working with Flink?

2020-06-15 Thread Dan Hill
Thanks for the replies! I was able to use the provided answers to get a setup working (maybe not the most efficiently). The main change I made was to switch to including the deploy jar in the image (rather than the default one). I'm open to contributing to a "rules_flink" project. I don't know

Re: [External] Measuring Kafka consumer lag

2020-06-15 Thread Padarn Wilson
Thanks Robert. Yes we monitor many of the Flink internal metric, which is why I was surprised that we were unable to notice the warning signs before our consumers notified us. It would be nice to measure the topic vs consumer group offset of the flink consumer. On Tue, Jun 16, 2020 at 1:57 AM Ro

Re: Reading from AVRO files

2020-06-15 Thread Arvid Heise
Hi Lorenzo, Thank you for confirming my suspicion. It really means something is broken in your Avro compiler setup and there is not much that we can do on our end. Just for reference, we are having a user.avsc [1] being compiled [2] with 1.8.2 into this snippet [3] for our tests. Look especially

Re: EventTimeSessionWindow firing too soon

2020-06-15 Thread Robert Metzger
If you are using event time in Flink, it is disconnected from the real world wall clock time. You can process historical data in a streaming program as if it was real-time data (potentially reading through (event time) years of data in a few (wall clock) minutes) On Mon, Jun 15, 2020 at 4:58 PM Yi

Re: Shared state between two process functions

2020-06-15 Thread Robert Metzger
Thanks for sharing some details on the use case: Are you able to move the common computation into one operator that runs before the ProcessFunctions, and you are sending the results there? You can build quite advanced dataflow graphs with Flink to model your problem. On Mon, Jun 15, 2020 at 9:01 A

Re: Request: Documentation for External Communication with Flink Cluster

2020-06-15 Thread Chesnay Schepler
https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/rest_api.html#api On 15/06/2020 17:47, Morgan Geldenhuys wrote: Hi Community, I am interested in creating an external client for submitting and managing Flink jobs via a HTTP/REST endpoint. Taking a look at the documentat

Re: Have already anyone tried to monitor Flink applications using VisualVM?

2020-06-15 Thread Robert Metzger
Thanks a lot for posting the solution! It might be helpful for other users. On Mon, Jun 15, 2020 at 2:36 PM Felipe Gutierrez < felipe.o.gutier...@gmail.com> wrote: > Hi, I managed to find the solution. Just for the record I am gonna post > here: > > I added the JMX parameters on the file bin/flin

Re: How do I backfill time series data?

2020-06-15 Thread Robert Metzger
Hi Marco, I'm not 100% if I understood the problem. Let me repeat: You want a stream of 15 minute averages for each unique "name". If there's no data available for a 15m average, use the data from the previous 15m time window? If that's the problem, you can probably build this using ProcessFuncti

Request: Documentation for External Communication with Flink Cluster

2020-06-15 Thread Morgan Geldenhuys
Hi Community, I am interested in creating an external client for submitting and managing Flink jobs via a HTTP/REST endpoint. Taking a look at the documentation, external communication is possible with the Dispatcher and JobManager (https://ci.apache.org/projects/flink/flink-docs-stable/ops/s

Re: Testing multi-sink flink jobs

2020-06-15 Thread Robert Metzger
Hey Marie, Why does it seem that you can only use a single instance of CollectSink? >From the CollectSink constructor signature, it seems that you can pass host + port to the class. Can't you allocate the CollectSink on different ports for the different sinks? Best, Robert On Wed, Jun 10, 2020

Re: Running Kubernetes on Flink with Savepoint

2020-06-15 Thread Robert Metzger
Hi Matt, sorry for the late reply. Why are you using the "flink-docker" helm example instead of https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html or https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html ? I don't think th

Re: EventTimeSessionWindow firing too soon

2020-06-15 Thread Yichao Yang
Hi I think it maybe you use the event time, and the timestamp between your event data is bigger than 30minutes, maybe you can check the source data timestamp. Best, Yichao Yang 发自我的iPhone -- Original -- From: Ori Popowski

Re: Stopping flink application with /jobs/:jobid/savepoints or /jobs/:jobid/stop

2020-06-15 Thread Robert Metzger
Hi, as Kostas said, draining just influences the watermarks Flink sends through your streaming topology. Maybe the watermarks you are sending yourself through the topology cause the state to be drained? I also don't know what the Beam API is doing underneath. Maybe it makes sense to check the Beam

EventTimeSessionWindow firing too soon

2020-06-15 Thread Ori Popowski
I'm using Flink 1.10 on YARN, and I have a EventTimeSessionWindow with a gap of 30 minutes. But as soon as I start the job, events are written to the sink (I can see them in S3) even though 30 minutes have not passed. This is my job: val stream = senv .addSource(new FlinkKafkaConsumer("…",

Kinesis ProvisionedThroughputExceededException

2020-06-15 Thread M Singh
Hi: I am using multiple (almost 30 and growing) Flink streaming applications that read from the same kinesis stream and get  ProvisionedThroughputExceededException exception which fails the job. I have seen a reference  http://mail-archives.apache.org/mod_mbox/flink-user/201811.mbox/%3CCAJnSTVxpu

[ANNOUNCE] Weekly Community Update 2020/23-24

2020-06-15 Thread Konstantin Knauf
Dear community, happy to share this community update on the last two weeks including the release of Stateful Functions 2.1, a table source for ElasticSearch, and a bit more. The community is still working on release testing for Apache Flink 1.11, so still comparably quite. Expecting the first feat

Re: Have already anyone tried to monitor Flink applications using VisualVM?

2020-06-15 Thread Felipe Gutierrez
Hi, I managed to find the solution. Just for the record I am gonna post here: I added the JMX parameters on the file bin/flink to submit the FLink job using these parameters. log_setting=(-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port="9010" -Dcom.sun.management.jmxremote.loca

Have already anyone tried to monitor Flink applications using VisualVM?

2020-06-15 Thread Felipe Gutierrez
Hi, I want to run a flink job with the JVM parameters "-Dcom.sun.management.jmxremote.port= -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false" in order to monitor this job with VisualVM. So I am doing like this: ./bin/flink run ../TaxiRideCount.jar -yD

Re: pyflink数据查询

2020-06-15 Thread godfrey he
hi jack,jincheng Flink 1.11 支持直接将select的结果collect到本地,例如: CloseableIterator it = tEnv.executeSql("select ...").collect(); while(it.hasNext()) { it.next() } 但是 pyflink 还没有引入 collect() 接口。(后续会完善?@jincheng) 但是1.11的TableResult#collect实现对流的query支持不完整(只支持append only的query),master已经完整支持。 可以参照 j

Re: pyflink数据查询

2020-06-15 Thread jincheng sun
你好 Jack, > pyflink 从source通过sql对数据进行查询聚合等操作 不输出到sink中,而是可以直接作为结果, 我这边可以通过开发web接口直接查询这个结果,不必去sink中进行查询 我理解你上面说的 【直接作为结果】+ 【web接口查询】已经包含了“sink”的动作。只是这个“sink” 是这样的实现而已。对于您的场景: 1. 如果您想直接将结果不落地(不存储)执行推送的 web页面,可以自定义一个Web Socket的Sink。 2. 如果您不是想直接推送到web页面,而是通过查询拉取结果,那么您上面说的 【直接作为结果】这句话就要描述一下,您想怎样作为结果?我

Re: Monitor job execution status per stage and aggregated time spent in stages

2020-06-15 Thread Till Rohrmann
Hi Theo, to the best of my knowledge this effort has not made a lot of progress recently. But thanks for voicing your interest in this feature. The community will try to take it into account when deciding on the priorities for the next release. Cheers, Till On Sat, Jun 13, 2020 at 10:59 PM Theo

Re: The Flink job recovered with wrong checkpoint state.

2020-06-15 Thread Thomas Huang
@Yun Tang,Thanks. From: Yun Tang Sent: Monday, June 15, 2020 11:30 To: Thomas Huang ; Flink Subject: Re: The Flink job recovered with wrong checkpoint state. Hi Thomas The answer is yes. Without high availability, once the job manager is

Re: Shared state between two process functions

2020-06-15 Thread Jaswin Shah
Basically, I have multiple processafunctions and they are doing some.computations based on some historical results and the historical events and results are common across the process functions due to which I have a lot of redundant processing in many process functions so, I have been thinking