Re: Duplicates in Flink

2015-09-03 Thread Rico Bergmann
The KafkaSink is the last step in my program after the 2nd deduplication. I could not yet track down where duplicates show up. That's a bit difficult to find out... But I'm trying to find it... > Am 03.09.2015 um 14:14 schrieb Stephan Ewen : > > Can you tell us where the KafkaSink comes into

Re: Duplicates in Flink

2015-09-03 Thread Stephan Ewen
Can you tell us where the KafkaSink comes into play? At what point do the duplicates come up? On Thu, Sep 3, 2015 at 2:09 PM, Rico Bergmann wrote: > No. I mean the KafkaSink. > > A bit more insight to my program: I read from a Kafka topic with > flinkKafkaConsumer082, then hashpartition the data

Re: Duplicates in Flink

2015-09-03 Thread Rico Bergmann
No. I mean the KafkaSink. A bit more insight to my program: I read from a Kafka topic with flinkKafkaConsumer082, then hashpartition the data, then I do a deduplication (does not eliminate all duplicates though). Then some computation, afterwards again deduplication (group by message in a wind

Re: Duplicates in Flink

2015-09-03 Thread Stephan Ewen
Do you mean the KafkaSource? Which KafkaSource are you using? The 0.9.1 FlinkKafkaConsumer082 or the KafkaSource? On Thu, Sep 3, 2015 at 1:10 PM, Rico Bergmann wrote: > Hi! > > Testing it with the current 0.10 snapshot is not easily possible atm > > But I deactivated checkpointing in my program

Re: Duplicates in Flink

2015-09-03 Thread Rico Bergmann
Hi! Testing it with the current 0.10 snapshot is not easily possible atm But I deactivated checkpointing in my program and still get duplicates in my output. So it seems not only to come from the checkpointing feature, or? May be the KafkaSink is responsible for this? (Just my guess) Cheers Ri

Re: Duplicates in Flink

2015-09-01 Thread Aljoscha Krettek
Hi Rico, unfortunately the 0.9 branch still seems to have problems with exactly once processing and checkpointed operators. We reworked how the checkpoints are handled for the 0.10 release so it should work well there. Could you maybe try running on the 0.10-SNAPSHOT release and see if the problem

Duplicates in Flink

2015-09-01 Thread Dipl.-Inf. Rico Bergmann
Hi! I still have an issue... I was now using 0.9.1 and the new KafkaConnector. But I still get duplicates in my flink prog. Here's the relevant part: final FlinkKafkaConsumer082 kafkaSrc = new FlinkKafkaConsumer082( kafkaTopicIn, new SimpleStringSchema(), properties);