Migrate custom partitioner from Flink 1.7 to Flink 1.9

2019-12-25 Thread Salva Alcántara
I am trying to migrate a custom dynamic partitioner from Flink 1.7 to Flink 1.9. The original partitioner implemented the `selectChannels` method within the `StreamPartitioner` interface like this: ```java // Original: working for Flink 1.7 //@Override public int[] selectChannels(Seria

Re: Flink 1.9 SQL Kafka Connector,Json format,how to deal with not json message?

2019-12-25 Thread Jark Wu
Hi LakeShen, I'm sorry there is no such configuration for json format currently. I think it makes sense to add such configuration like 'format.ignore-parse-errors' in csv format. I created FLINK-15396[1] to track this. Best, Jark [1]: https://issues.apache.org/jira/browse/FLINK-15396 On Thu, 26

Re: testing - syncing data timeline

2019-12-25 Thread Kurt Young
Lets say you keep your #1, which does hourly counting, and emit result to the merged new #2. The new #2 would first keep all hourly result in state, and also keep tracking whether it already receive all 24 results belong to same day. Once you received all 24 count belong to the same day, you can st

Re: testing - syncing data timeline

2019-12-25 Thread Avi Levi
not sure that I can see how it is simpler. #2 is time window per day it emits the highest hour for that day. #4 is not a time window it keeps history of several days . if I want to put the logic of #2 I will need to manage it with timers, correct ? On Thu, Dec 26, 2019 at 6:28 AM Kurt Young wrote

Re: Aggregating Movie Rental information in a DynamoDB table using DynamoDB streams and Flink

2019-12-25 Thread Zhijiang
Hi Joe, Your requirement is the effective exactly-once for external sink. I think your option 4 with TwoPhaseCommitSinkFunction is the right way to go. Unfortunately I am not quite familiar with this part, so can not give you specific suggestions for using it, especially for your concern of stor

Re: testing - syncing data timeline

2019-12-25 Thread Kurt Young
Hi, You can merge the logic of #2 into #4, it will be much simpler. Best, Kurt On Wed, Dec 25, 2019 at 7:36 PM Avi Levi wrote: > Hi , > > I have the following pipeline : > 1. single hour window that counts the number of records > 2. single day window that accepts the aggregated data from #1

Flink 1.9 SQL Kafka Connector,Json format,how to deal with not json message?

2019-12-25 Thread LakeShen
Hi community,when I write the flink ddl sql like this: CREATE TABLE kafka_src ( id varchar, a varchar, b TIMESTAMP, c TIMESTAMP ) with ( ... 'format.type' = 'json', 'format.property-version' = '1', 'format.derive-schema' = 'true', 'update-mode' = 'append' ); If the me

Re: Rewind offset to a previous position and ensure certainty.

2019-12-25 Thread Zhijiang
If I understood correctly, different partitions of Kafka would be emitted by different source tasks with different watermark progress. And the Flink framework would align the different watermarks to only output the smallest watermark among them, so the events from slow partitions would not be d

Re: Rewind offset to a previous position and ensure certainty.

2019-12-25 Thread vino yang
Hi Ruibin, Are you finding how to generate watermark pre Kafka partition? Flink provides Kafka-partition-aware watermark generation. [1] Best, Vino [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_timestamps_watermarks.html#timestamps-per-kafka-partition 邢瑞斌 于2019年12月25日周三

Aggregating Movie Rental information in a DynamoDB table using DynamoDB streams and Flink

2019-12-25 Thread Joe Hansen
Happy Holidays everyone! tl;dr: I need to aggregate movie rental information that is being stored in one DynamoDB table and store running total of the aggregation in another table. How do I ensure exactly-once aggregation. I currently store movie rental information in a DynamoDB table named Movie

Re: Apache Flink - Flink Metrics collection using Prometheus on EMR from streaming mode

2019-12-25 Thread M Singh
Thanks Vino and Rafi for your references. Regarding push gateway recommendations for batch - I am following this reference (https://prometheus.io/docs/practices/pushing/). The scenario that I have is that we start Flink Apps on EMR whenever we need them. Sometimes the task manager gets killed an

Rewind offset to a previous position and ensure certainty.

2019-12-25 Thread 邢瑞斌
Hi, I'm trying to use Kafka as an event store and I want to create several partitions to improve read/write throughput. Occasionally I need to rewind offset to a previous position for recomputing. Since order isn't guaranteed among partitions in Kafka, does this mean that Flink won't produce the s

testing - syncing data timeline

2019-12-25 Thread Avi Levi
Hi , I have the following pipeline : 1. single hour window that counts the number of records 2. single day window that accepts the aggregated data from #1 and emits the highest hour count of that day 3. union #1 + #2 4. Logic operator that accepts the data from #3 and keep a listState of #2 and a

Re: question: jvm heap size per task?

2019-12-25 Thread Li Zhao
Understood, thank you for the quick response! Thanks, Li Xintong Song 于2019年12月25日周三 下午5:05写道: > Hi Li, > > It is true that currently all the task managers have the same heap size, > and it's fixed ever since started. Unfortunately, your needs cannot be > satisfied at the moment. > > Heap size

Re: question: jvm heap size per task?

2019-12-25 Thread Xintong Song
Hi Li, It is true that currently all the task managers have the same heap size, and it's fixed ever since started. Unfortunately, your needs cannot be satisfied at the moment. Heap size of task managers cannot be changed once started, because flink task managers run in JVMs and JVM does not suppo

Re: Apache Flink - Flink Metrics collection using Prometheus on EMR from streaming mode

2019-12-25 Thread Rafi Aroch
Hi, Take a look here: https://github.com/eastcirclek/flink-service-discovery I used it successfully quite a while ago, so things might have changed since. Thanks, Rafi On Wed, Dec 25, 2019, 05:54 vino yang wrote: > Hi Mans, > > IMO, the mechanism of metrics reporter does not depend on any dep

question: jvm heap size per task?

2019-12-25 Thread Li Zhao
Hi, Greetings, hope this is the proper place to ask questions, apologize if not. We have a shared flink cluster running with docker, want to set different heap size per task(some tasks require larger heap size, while most tasks only need a little), is it feasible? I've gone through [1], [2] and