Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-13 Thread Yun Gao
Hi all, I updated the FLIP[1] to reflect the major discussed points in the ML thread: 1) For the "new" root tasks finished before it received trigger message, previously we proposed to let JM re-compute and re-trigger the descendant tasks, but after the discussion we realized that it might

Re: Metrics for average time taken by flatMap function

2021-01-13 Thread Manish G
This approach has an issue. Even for those periods when there is no activity, still the latest gauge value is used for calculations and this generates graphs which are not correct representation of the situation. On Tue, Jan 12, 2021 at 7:01 PM Manish G wrote: > Prometheus provides avg_over_time

Re: Re: Idea import Flink source code

2021-01-13 Thread Matthias Pohl
Don't forget to use the reply-all button when replying to threads on the mailing lists. :-) Have you tried building the project via command line using `mvn -DskipTests -Dfast install` to pull all dependencies? And just to verify: you didn't change the code, did you? We're talking about the vanilla

Re: Flink to get historical data from kafka between timespan t1 & t2

2021-01-13 Thread Aljoscha Krettek
On 2021/01/13 07:58, vinay.raic...@t-systems.com wrote: Not sure about your proposal regarding Point 3: * firstly how is it ensured that the stream is closed? If I understand the doc correctly the stream will be established starting with the latest timestamp (hmm... is it not a standard behavio

Re:Re: Re: Idea import Flink source code

2021-01-13 Thread penguin.
Hi, I click the reply button every time... Does this mean that only the replied person can see the email? If Maven fails to download plugins or dependencies, is mvn -clean install -DskipTests a must? I'll try first. penguin 在 2021-01-13 16:35:10,"Matthias Pohl" 写道: Don't for

Re: Dead code in ES Sink

2021-01-13 Thread Aljoscha Krettek
On 2021/01/12 15:04, Rex Fenley wrote: [2] https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/table/ElasticsearchOptions.java#L131 Should [2] be remove

Re: Re: Re: Idea import Flink source code

2021-01-13 Thread Matthias Pohl
The mvn command helps to identify whether your issue is related to Maven and/or missing dependencies or whether it's an Intellij problem. Usually, running `mvn clean install -DskipTests -Dfast` isn't required to import the Flink project into Intellij. Best, Matthias PS: reply adds only the immedi

Re: Metrics for average time taken by flatMap function

2021-01-13 Thread Chesnay Schepler
If you want the gauge to only represent recent activity then you will need to use a timer of sorts to reset the gauge after N time (something larger than the reporter interval) unless it was changed in the meantime (e.g., by also recording a timestamp within SimpleGauge) On 1/13/2021 9:33 AM,

Re: Flink 1.12 Kryo Serialization Error

2021-01-13 Thread Timo Walther
Hi Yuval, could you share a reproducible example with us? I see you are using SQL / Table API with a RAW type. I could imagine that the KryoSerializer is configured differently when serializing and when deserializing. This might be due to `ExecutionConfig` not shipped (or copied) through the

Re: Flink streaming sql是否支持两层group by聚合

2021-01-13 Thread Joshua Fan
Hi Jark and Benchao I have learned from your previous email on how to do pv/uv in flink sql. One is to make a MMdd grouping, the other is to make a day window. Thank you all. I have a question about the result output. For MMdd grouping, every minute the database would get a record, and m

Bugs in Streaming job stopping? Weird graceful stop/restart for disjoint job graph

2021-01-13 Thread Theo Diefenthal
Hi there, I'm currently analyzing a weird behavior of one of our jobs running on YARN with Flink 1.11.2. I have a kind of special situation here in that regard that I submit a single streaming job with a disjoint job graph, i.e. that job contains two graphs of the same kind but totally indepen

RE: Flink to get historical data from kafka between timespan t1 & t2

2021-01-13 Thread VINAY.RAICHUR
Ok. Attached is the PPT of what am attempting to achieve w.r.t. time Hope I am all set to achieve the three bullets mentioned in attached slide to create reports with KafkaSource and KafkaBuilder approach. If you have any additional tips to share please do so after going through the slide att

Re: Flink to get historical data from kafka between timespan t1 & t2

2021-01-13 Thread Aljoscha Krettek
On 2021/01/13 12:07, vinay.raic...@t-systems.com wrote: Ok. Attached is the PPT of what am attempting to achieve w.r.t. time Hope I am all set to achieve the three bullets mentioned in attached slide to create reports with KafkaSource and KafkaBuilder approach. If you have any additional tips

Re: Flink app logs to Elastic Search

2021-01-13 Thread Aljoscha Krettek
On 2021/01/11 01:29, bat man wrote: Yes, no entries to the elastic search. No indices were created in elastic. Jar is getting picked up which I can see from yarn logs. Pre-defined text based logging is also available. Hmm, I can't imagine much that could go wrong. Maybe there is some interfere

Re: Snowflake access through JDBC

2021-01-13 Thread Abhishek Rai
Thanks Roman, I ended up switching to the DataStream API and using JdbcInputFormat like you suggested and that worked out fine. Thanks! On Fri, Dec 18, 2020 at 10:21 AM Khachatryan Roman wrote: > > Hello, > > Unfortunately, this driver is not currently supported by the Table API [1]. > You can i

Publishing a table to Kafka

2021-01-13 Thread Abhishek Rai
Hello, I'm using Flink 1.11.2 where I have a SQL backed `Table` which I'm trying to write to Kafka. I'm using `KafkaTableSourceSinkFactory` which ends up instantiating a table sink of type `KafkaTableSink`. Since this sink is an `AppendStreamTableSink`, I cannot write to it using a generic table

Insufficient number of network buffers - rerun with min amount of required network memory

2021-01-13 Thread abelm
Hello! I have a program which creates and runs a local Flink 1.12 environment. I understand that, based on factors such as parallelism and the presence of any process functions, the "Insufficient number of network buffers" exception might pop up. My plan is to catch this exception inside the main

Histogram has count data but not sum

2021-01-13 Thread Manish G
Hi All, I have added histogram to code following here . But I observe that on prometheus board I get only count metrics, not sum. Metrics itself is missing. I have used classes: com.codahale.metrics.Histogram org.

Re: Histogram has count data but not sum

2021-01-13 Thread Chesnay Schepler
What exactly do you mean with "count metrics" and "sum"? Given a Histogram H, you should see: - one time series named H_count - one time series named H, with various labels for the different quantiles. What do you see in Prometheus, and what do you consider to be missing? On 1/13/2021 4:10 PM

Re: Histogram has count data but not sum

2021-01-13 Thread Manish G
So I an following this link too, to calculate average request duration. As per this page, histogram metrics has _count, _sum data. Though it is prometheus histogram, I expected flink histogram too would provide me same. On Wed, Jan 13, 2021 at 8:5

Re: Dead code in ES Sink

2021-01-13 Thread Rex Fenley
>The actual field is never used but it will be used to check the allowed options when verifying what users specify via "string" options. Are you saying that this option does get passed along to Elasticsearch still or that it's just arbitrarily validated? According to [1] it's been deprecated in ES

Re: Histogram has count data but not sum

2021-01-13 Thread Chesnay Schepler
Flink Histograms are exposed as Summaries , since the quantiles are computed on the client-side (i.e., Flink side) Flink Histograms cannot provide _sum data because we simply do not compute it, IIRC because Dropwizard Histograms also d

StreamingFileSink with ParquetAvroWriters

2021-01-13 Thread Jan Oelschlegel
Hi, i'm using Flink (v.1.11.2) and would like to use the StreamingFileSink for writing into HDFS in Parquet format. As it says in the documentation I have added the dependencies: org.apache.flink flink-parquet_${scala.binary.version} ${flink.version} And this is my file sink definit

Re: state reset(lost) on TM recovery

2021-01-13 Thread Alexey Trenikhun
That is it ! - Protobuf compiler generates hashCodes functions which are not stable cross JVM restarts ([1]), this explains observed behavior. It is clear that stable hashCode is mandatory for KeyedProcessFunctions, but is it also requirement for MapState keys? Looks like rocksdb backend first

error accessing S3 bucket 1.12

2021-01-13 Thread Billy Bain
I'm trying to use readTextFile() to access files in S3. I have verified the s3 key and secret are clean and the s3 path is similar to com.somepath/somefile. (the names changed to protect the guilty) Any idea what I'm missing? 2021-01-13 12:12:43,836 DEBUG org.apache.flink.streaming.api.functions.

Uncaught exception in FatalExitExceptionHandler causing JM crash while canceling job

2021-01-13 Thread Kelly Smith
Hi folks, I recently upgraded to Flink 1.12.0 and I’m hitting an issue where my JM is crashing while cancelling a job. This is causing Kubernetes readiness probes to fail, the JM to be restarted, and then get in a bad state while it tries to recover itself using ZK + a checkpoint which no longe

Flink Elasticsearch Async

2021-01-13 Thread Rex Fenley
Hello, Looking at this documentation https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/connectors/elasticsearch.html#sink-bulk-flush-interval it says > The interval to flush buffered actions. Can be set to '0' to disable it. Note, both 'sink.bulk-flush.max-size' and 'sink.bul

Elasticsearch config maxes can not be disabled

2021-01-13 Thread Rex Fenley
Hello, It doesn't seem like we can disable max actions and max size for Elasticsearch connector. Docs: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/connectors/elasticsearch.html#sink-bulk-flush-max-actions sink.bulk-flush.max-actions optional 1000 Integer Maximum number

Re: state reset(lost) on TM recovery

2021-01-13 Thread Chesnay Schepler
The FsStateBackend makes heavy use of hashcodes, so it must be stable. On 1/13/2021 7:13 PM, Alexey Trenikhun wrote: That is it ! - Protobuf compiler generates hashCodes functions which are not stable cross JVM restarts ([1]), this explains observed behavior. It is clear that stable hashCode is

Re: state reset(lost) on TM recovery

2021-01-13 Thread Alexey Trenikhun
Ok, thanks. From: Chesnay Schepler Sent: Wednesday, January 13, 2021 11:46:15 AM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: state reset(lost) on TM recovery The FsStateBackend makes heavy use of hashcodes, so it must be stable. On 1/13/2021 7:13

Interpretting checkpoint data size

2021-01-13 Thread Rex Fenley
Hello, I have incremental checkpoints turned on and there seems to be no relation at all to how often the job checkpoints and how much data exists. Whether checkpoints are set to every 1 min or every 5 seconds they're still around 5 GB in size and checkpoint times are still in minutes. I would exp

Re: Interpretting checkpoint data size

2021-01-13 Thread Rex Fenley
One thing that did just come to mind is possibly every time I'm submitting a job from a previous checkpoint with different settings, it has to slowly re-checkpoint all the previous data. Which means there would be some warm up time before things functioned in a steady state. Is this possible? On W

Flink[Python] questions

2021-01-13 Thread Dc Zhao (BLOOMBERG/ 120 PARK)
Hi Flink Community: We are using the pyflink to develop a POC for our project. We encountered some questions while using the flink. We are using the flink version 1.2, python3.7, data stream API 1. Do you have examples of providing a python customized class as a `result type`? Based on the docu

Using Kafka as bounded source with DataStream API in batch mode (Flink 1.12)

2021-01-13 Thread sagar
Hi Team, I am getting the following error while running DataStream API in with batch mode with kafka source. I am using FlinkKafkaConsumer to consume the data. Caused by: java.lang.IllegalStateException: Detected an UNBOUNDED source with the 'execution.runtime-mode' set to 'BATCH'. This combinati

Re: Using Kafka as bounded source with DataStream API in batch mode (Flink 1.12)

2021-01-13 Thread Ardhani Narasimha
Interesting use case. Can you please elaborate more on this. On what criteria do you want to batch? Time? Count? Or Size? On Thu, 14 Jan 2021 at 12:15 PM, sagar wrote: > Hi Team, > > I am getting the following error while running DataStream API in with > batch mode with kafka source. > I am usi

Re: Statement Sets

2021-01-13 Thread Jark Wu
No. The Kafka reader will be shared, that means Kafka data is only be read once. On Tue, 12 Jan 2021 at 03:04, Aeden Jameson wrote: > When using statement sets, if two select queries use the same table > (e.g. Kafka Topic), does each query get its own copy of data? > > Thank you, > Aeden >

Re: Flink SQL - IntervalJoin doesn't support consuming update and delete - trying to deduplicate rows

2021-01-13 Thread Jark Wu
Hi Dan, Sorry for the late reply. I guess you applied a "deduplication with keeping last row" before the interval join? That will produce an updating stream and interval join only supports append-only input. You can try to apply "deduplication with keeping *first* row" before the interval join. T

Re: Re: Using Kafka as bounded source with DataStream API in batch mode (Flink 1.12)

2021-01-13 Thread Yun Gao
Hi Sagar, I think the problem is that the legacy source implemented by extending SourceFunction are all defined as CONTINOUS_UNBOUNDED when use env.addSource(). Although there is hacky way to add the legacy sources as BOUNDED source [1], I think you may first have a try of new version of

Re: Using Kafka as bounded source with DataStream API in batch mode (Flink 1.12)

2021-01-13 Thread sagar
Hi Ardhani, So whenever I want to run this flink job, I will call the Java API to put the data to the four different kafka topics, what data to put into kafka will be coded into those API and then once that is complete, I want to run the flink job on the available data in the kafka and perform bus

Re: StreamingFileSink with ParquetAvroWriters

2021-01-13 Thread Yun Gao
Hi Jan, Could you have a try by adding this dependency ? org.apache.parquet parquet-avro 1.11.1 Best, Yun --Original Mail -- Sender:Jan Oelschlegel Send Date:Thu Jan 14 00:49:30 2021 Recipients:user@flink.apache.org Subject:StreamingFileSi