Hi Niclas,

About the second point you mentioned, was the processed message a random one or 
a fixed one? 

The default startup mode for FlinkKafkaConsumer is StartupMode.GROUP_OFFSETS, 
maybe you could try StartupMode.EARLIST while debugging. Also, before that, you 
may try fetching the messages with the Kafka console consumer tool to see 
whether they can be consumed completely.

Besides, I wonder if you could provide the code for you Flink pipeline. That’ll 
be helpful.

Best,
Xingcan



> On 18 Feb 2018, at 7:52 PM, Niclas Hedhman <nic...@apache.org> wrote:
> 
> 
> So, the producer is run (at the moment) manually (command-line) one message 
> at a time.
> Kafka's tooling (different consumer group) shows that a message is added each 
> time.
> 
> Since my last post, I have also added a UUID as the key, and that didn't make 
> a difference, so you are likely correct about de-dup.
> 
> 
> There is only a single partition on the topic, so it shouldn't be a 
> partitioning issue.
> 
> I also noticed;
> 1. Sending a message while consumer topology is running, after the first 
> message, then that message will be processed after a restart.
> 
> 2. Sending many messages, while consumer is running, and then doing many 
> restarts will only process a single of those. No idea what happens to the 
> others.
> 
> I am utterly confused.
> 
> And digging in the internals are not for the faint-hearted, but the 
> kafka.poll() returns frequently with empty records.
> 
> Will continue debugging that tomorrow...
> 
> 
> Niclas
> 
> On Feb 18, 2018 18:50, "Fabian Hueske" <fhue...@gmail.com 
> <mailto:fhue...@gmail.com>> wrote:
> Hi Niclas,
> 
> Flink's Kafka consumer should not apply any deduplication. AFAIK, such a 
> "feature" is not implemented.
> Do you produce into the topic that you want to read or is the data in the 
> topic static?
> If you do not produce in the topic while the consuming application is 
> running, this might be an issue with the start position of the consumer [1]. 
> 
> Best, Fabian
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-consumers-start-position-configuration
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-consumers-start-position-configuration>
> 
> 2018-02-18 8:14 GMT+01:00 Niclas Hedhman <nic...@apache.org 
> <mailto:nic...@apache.org>>:
> Hi,
> I am pretty new to Flink, and I like what I see and have started to build my 
> first application using it.
> I must be missing something very fundamental. I have a FlinkKafkaConsumer011, 
> followed by a handful of filter, map and flatMap functions and terminated 
> with the standard CassandraSink. I have try..catch on all my own maps/filters 
> and the first message in the queue is processed after start-up, but any 
> additional messages are ignore, i.e. not reaching the first map(). Any 
> additional messages are swallowed (i.e. consumed but not forwarded).
> 
> I suspect that this is some type of de-duplication going on, since the (test) 
> producer of these messages. The producer provide different values on each, 
> but there is no "key" being passed to the KafkaProducer.
> 
> Is that required? And if so, why? Can I tell Flink or Flink's KafkaConsumer 
> to ingest all messages, and not try to de-duplicate them?
> 
> Thanks
> 
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org <http://zest.apache.org/> - New Energy for Java
> 

Reply via email to