So, the producer is run (at the moment) manually (command-line) one message at a time. Kafka's tooling (different consumer group) shows that a message is added each time.
Since my last post, I have also added a UUID as the key, and that didn't make a difference, so you are likely correct about de-dup. There is only a single partition on the topic, so it shouldn't be a partitioning issue. I also noticed; 1. Sending a message while consumer topology is running, after the first message, then that message will be processed after a restart. 2. Sending many messages, while consumer is running, and then doing many restarts will only process a single of those. No idea what happens to the others. I am utterly confused. And digging in the internals are not for the faint-hearted, but the kafka.poll() returns frequently with empty records. Will continue debugging that tomorrow... Niclas On Feb 18, 2018 18:50, "Fabian Hueske" <fhue...@gmail.com> wrote: > Hi Niclas, > > Flink's Kafka consumer should not apply any deduplication. AFAIK, such a > "feature" is not implemented. > Do you produce into the topic that you want to read or is the data in the > topic static? > If you do not produce in the topic while the consuming application is > running, this might be an issue with the start position of the consumer > [1]. > > Best, Fabian > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.4/dev/connectors/kafka.html#kafka-consumers- > start-position-configuration > > 2018-02-18 8:14 GMT+01:00 Niclas Hedhman <nic...@apache.org>: > >> Hi, >> I am pretty new to Flink, and I like what I see and have started to build >> my first application using it. >> I must be missing something very fundamental. I have a >> FlinkKafkaConsumer011, followed by a handful of filter, map and flatMap >> functions and terminated with the standard CassandraSink. I have try..catch >> on all my own maps/filters and the first message in the queue is processed >> after start-up, but any additional messages are ignore, i.e. not reaching >> the first map(). Any additional messages are swallowed (i.e. consumed but >> not forwarded). >> >> I suspect that this is some type of de-duplication going on, since the >> (test) producer of these messages. The producer provide different values on >> each, but there is no "key" being passed to the KafkaProducer. >> >> Is that required? And if so, why? Can I tell Flink or Flink's >> KafkaConsumer to ingest all messages, and not try to de-duplicate them? >> >> Thanks >> >> -- >> Niclas Hedhman, Software Developer >> http://zest.apache.org - New Energy for Java >> > >