I am trying to migrate a custom dynamic partitioner from Flink 1.7 to Flink
1.9. The original partitioner implemented the `selectChannels` method within
the `StreamPartitioner` interface like this:
```java
// Original: working for Flink 1.7
//@Override
public int[] selectChannels(Seria
Hi LakeShen,
I'm sorry there is no such configuration for json format currently.
I think it makes sense to add such configuration like
'format.ignore-parse-errors' in csv format.
I created FLINK-15396[1] to track this.
Best,
Jark
[1]: https://issues.apache.org/jira/browse/FLINK-15396
On Thu, 26
Lets say you keep your #1, which does hourly counting, and emit result to
the merged
new #2. The new #2 would first keep all hourly result in state, and also
keep tracking
whether it already receive all 24 results belong to same day. Once you
received all 24
count belong to the same day, you can st
not sure that I can see how it is simpler. #2 is time window per day it
emits the highest hour for that day. #4 is not a time window it keeps
history of several days . if I want to put the logic of #2 I will need to
manage it with timers, correct ?
On Thu, Dec 26, 2019 at 6:28 AM Kurt Young wrote
Hi Joe,
Your requirement is the effective exactly-once for external sink. I think your
option 4 with TwoPhaseCommitSinkFunction is the right way to go.
Unfortunately I am not quite familiar with this part, so can not give you
specific suggestions for using it, especially for your concern of stor
Hi,
You can merge the logic of #2 into #4, it will be much simpler.
Best,
Kurt
On Wed, Dec 25, 2019 at 7:36 PM Avi Levi wrote:
> Hi ,
>
> I have the following pipeline :
> 1. single hour window that counts the number of records
> 2. single day window that accepts the aggregated data from #1
Hi community,when I write the flink ddl sql like this:
CREATE TABLE kafka_src (
id varchar,
a varchar,
b TIMESTAMP,
c TIMESTAMP
)
with (
...
'format.type' = 'json',
'format.property-version' = '1',
'format.derive-schema' = 'true',
'update-mode' = 'append'
);
If the me
If I understood correctly, different partitions of Kafka would be emitted by
different source tasks with different watermark progress. And the Flink
framework would align the different watermarks to only output the smallest
watermark among them, so the events from slow partitions would not be d
Hi Ruibin,
Are you finding how to generate watermark pre Kafka partition?
Flink provides Kafka-partition-aware watermark generation. [1]
Best,
Vino
[1]:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_timestamps_watermarks.html#timestamps-per-kafka-partition
邢瑞斌 于2019年12月25日周三
Happy Holidays everyone!
tl;dr: I need to aggregate movie rental information that is being
stored in one DynamoDB table and store running total of the
aggregation in another table. How do I ensure exactly-once
aggregation.
I currently store movie rental information in a DynamoDB table named
Movie
Thanks Vino and Rafi for your references.
Regarding push gateway recommendations for batch - I am following this
reference (https://prometheus.io/docs/practices/pushing/).
The scenario that I have is that we start Flink Apps on EMR whenever we need
them. Sometimes the task manager gets killed an
Hi,
I'm trying to use Kafka as an event store and I want to create several
partitions to improve read/write throughput. Occasionally I need to rewind
offset to a previous position for recomputing. Since order isn't guaranteed
among partitions in Kafka, does this mean that Flink won't produce the s
Hi ,
I have the following pipeline :
1. single hour window that counts the number of records
2. single day window that accepts the aggregated data from #1 and emits the
highest hour count of that day
3. union #1 + #2
4. Logic operator that accepts the data from #3 and keep a listState of #2
and a
Understood, thank you for the quick response!
Thanks,
Li
Xintong Song 于2019年12月25日周三 下午5:05写道:
> Hi Li,
>
> It is true that currently all the task managers have the same heap size,
> and it's fixed ever since started. Unfortunately, your needs cannot be
> satisfied at the moment.
>
> Heap size
Hi Li,
It is true that currently all the task managers have the same heap size,
and it's fixed ever since started. Unfortunately, your needs cannot be
satisfied at the moment.
Heap size of task managers cannot be changed once started, because flink
task managers run in JVMs and JVM does not suppo
Hi,
Take a look here: https://github.com/eastcirclek/flink-service-discovery
I used it successfully quite a while ago, so things might have changed
since.
Thanks,
Rafi
On Wed, Dec 25, 2019, 05:54 vino yang wrote:
> Hi Mans,
>
> IMO, the mechanism of metrics reporter does not depend on any dep
Hi,
Greetings, hope this is the proper place to ask questions, apologize if not.
We have a shared flink cluster running with docker, want to set different
heap size per task(some tasks require larger heap size, while most tasks
only need a little), is it feasible?
I've gone through [1], [2] and
17 matches
Mail list logo