Manual error/recovery handling in flink applications

2019-01-30 Thread Patrick Fial
I am working on an application based on Apache Flink, which makes use of Apache Kafka for input and out. I have the requirement that all incoming messages received via kafka must be processed in-order, as well safely be stored in a persistence layer (database), and no message must get lost. The

Re: Filter Date type in Table API

2019-01-30 Thread Zhenghua Gao
Just try: filter("f_date <= '1998-10-02'.toDate") -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Writing a custom Rocksdb statistics collector

2019-01-30 Thread Yun Tang
Hi Harshvardhan First of all, 'DBOptions' is not serializable, I think you cannot include it in the source constructor. I also wondering whether the given `DBOptions` could query RocksDB's statistics since they are not the actual options to open RocksDB. We have tried to report RocksDB's stati

Why OnDataStream in flink accepts partial function but DataStream do not

2019-01-30 Thread Renkai
As shown in https://ci.apache.org/projects/flink/flink-docs-master/dev/scala_api_extensions.html ,we can use scala partitial function by import org.apache.flink.streaming.api.scala.extensions._ and replace .map by .mapWith. but the signature of def mapWith[R: TypeInformation](fun: T => R): DataSt

Re: About KafkaConsumer and WM'ing and EventTime charactersitics

2019-01-30 Thread Jamie Grier
Vishal, that answer to your question about IngestionTime is "no". Ingestion time in this context means the time the data was read by Flink not the time it was written to Kafka. To get the effect you're looking for you have to set TimeCharacteristic.EventTime and follow the instructions here: https

Writing a custom Rocksdb statistics collector

2019-01-30 Thread Harshvardhan Agrawal
Hi, I am currently trying to integrate RocksDB statistics in my pipeline. The basic idea is that we want to pass RocksDB stats through the same pipeline that is doing our processing and write them to Elasticsearch so that we can visualize them in Kibana. I have written a custom source function t

Re: getting duplicate messages from duplicate jobs

2019-01-30 Thread Selvaraj chennappan
I have faced same problem . https://stackoverflow.com/questions/54286486/two-kafka-consumer-in-same-group-and-one-partition On Wed, Jan 30, 2019 at 6:11 PM Avi Levi wrote: > Ok, if you guys think it's should be like that then so be it. All I am > saying is that it is not standard behaviour from

Apache flink - event time based watermarks generators - optimal strategy

2019-01-30 Thread simpleusr
I am a flink newbie and trying to apply windowing . My source is kafka and my model does not contain event time info so, I am tring to use Kafka timestamps with assignTimestampsAndWatermarks() method I implemented two timestamp assigners as below. public class TimestampAssigner1 implements Assign

Re: UDAF Flink-SQL return null would lead to checkpoint fails

2019-01-30 Thread Timo Walther
Hi Henry, could you share a little reproducible example? From what I see you are using a custom aggregate function with a case class inside, right? Flink's case class serializer does not support null because the usage of `null` is also not very Scala like. Use a `Row` type for supporting nul

Re: getting duplicate messages from duplicate jobs

2019-01-30 Thread Avi Levi
Ok, if you guys think it's should be like that then so be it. All I am saying is that it is not standard behaviour from kafka consumer, at least according to the documentation . I understand that flink implements things differently and all I

UDAF Flink-SQL return null would lead to checkpoint fails

2019-01-30 Thread 徐涛
Hi Experts, In my self-defined UDAF, I found if I return a null value in UDAF, would cause checkpoint fails, the following is the error log: I think it is quite a common case to return a null value in UDAF, because sometimes no value could be determined, why Flink has such a limit

workaround for kafka idle partition and watermark

2019-01-30 Thread morin . david . bzh
Hello, I've faced to this issue in production: https://issues.apache.org/jira/browse/FLINK-5479 One topic contains idle partition and all pipeline is quite fucked up. I've used aggregation based on these watermarks and the trigger is never launched. is it possible to define a workaround in waiti

Re: About KafkaConsumer and WM'ing and EventTime charactersitics

2019-01-30 Thread Vishal Santoshi
Thank you. This though is a little different. The producer of the kafka message attaches a time stamp https://issues.apache.org/jira/browse/KAFKA-2511. I do not see how I can get to that timestamp through a any stream abstraction over FlinkKafkaConsumer API even though it is available here https