So, the producer is run (at the moment) manually (command-line) one message
at a time.
Kafka's tooling (different consumer group) shows that a message is added
each time.

Since my last post, I have also added a UUID as the key, and that didn't
make a difference, so you are likely correct about de-dup.


There is only a single partition on the topic, so it shouldn't be a
partitioning issue.

I also noticed;
1. Sending a message while consumer topology is running, after the first
message, then that message will be processed after a restart.

2. Sending many messages, while consumer is running, and then doing many
restarts will only process a single of those. No idea what happens to the
others.

I am utterly confused.

And digging in the internals are not for the faint-hearted, but the
kafka.poll() returns frequently with empty records.

Will continue debugging that tomorrow...


Niclas

On Feb 18, 2018 18:50, "Fabian Hueske" <fhue...@gmail.com> wrote:

> Hi Niclas,
>
> Flink's Kafka consumer should not apply any deduplication. AFAIK, such a
> "feature" is not implemented.
> Do you produce into the topic that you want to read or is the data in the
> topic static?
> If you do not produce in the topic while the consuming application is
> running, this might be an issue with the start position of the consumer
> [1].
>
> Best, Fabian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.4/dev/connectors/kafka.html#kafka-consumers-
> start-position-configuration
>
> 2018-02-18 8:14 GMT+01:00 Niclas Hedhman <nic...@apache.org>:
>
>> Hi,
>> I am pretty new to Flink, and I like what I see and have started to build
>> my first application using it.
>> I must be missing something very fundamental. I have a
>> FlinkKafkaConsumer011, followed by a handful of filter, map and flatMap
>> functions and terminated with the standard CassandraSink. I have try..catch
>> on all my own maps/filters and the first message in the queue is processed
>> after start-up, but any additional messages are ignore, i.e. not reaching
>> the first map(). Any additional messages are swallowed (i.e. consumed but
>> not forwarded).
>>
>> I suspect that this is some type of de-duplication going on, since the
>> (test) producer of these messages. The producer provide different values on
>> each, but there is no "key" being passed to the KafkaProducer.
>>
>> Is that required? And if so, why? Can I tell Flink or Flink's
>> KafkaConsumer to ingest all messages, and not try to de-duplicate them?
>>
>> Thanks
>>
>> --
>> Niclas Hedhman, Software Developer
>> http://zest.apache.org - New Energy for Java
>>
>
>

Reply via email to