As I understand it, the Flink Kafka Producer may emit duplicates to Kafka
topics.
How can I deduplicate these messages when reading them back with Flink (via
the Flink Kafka Consumer)?
For example, is there any out-the-box support for deduplicating messages,
i.e. by incorporating something like "
The case I described was for experiment only but data skewness would happen
in production. The current implementation will block the watermark emission
to downstream until all partition move forward which has great impact on
latency. It may be a good idea to expose an API to users to decide what th
Hi Nico,
Recently I've tried the queryable state a bit differently, by using
ValueState with a value of a util.ArrayList and a ValueSerializer for
util.ArrayList and it works as expected.
The non-working example you can browse here:
https://github.com/dawidwys/flink-intro/tree/c66f01117b0fe3c0adc
Hi Fabian,
I have opened a ticket for that, thanks.
I have another question: now that I have obtained the proper local
grouping, I did some aggregation of type [T] -> U, where one aggregated
object is of type U, containing information of zero or more Ts. The Us are
still tied to the hostname, and
Hi Sebastian,
I'm not aware of a better way of implementing this in Flink. You could
implement your own XmlInputFormat using Flink's InputFormat abstractions,
but you would end up with almost exactly the same code as Mahout / Hadoop.
I wonder why the decompression with the XmlInputFormat doesn't w