btw, it appears the missing msgs are at the end of the CSV file, so maybe the producer doesn't properly flush when it gets EOF on stdin ?
On Wed, Jun 15, 2016 at 11:21 AM, Dean Arnold <renodino...@gmail.com> wrote: > I'm seeing similar issues with 0.9.0.1. > > I'm feeding CSV records (65536 total, 1 record per msg) to the console > producer, which are consumed via a sink connector (using connect-standalone > and a single partition). The sink occasionally reports flushing less than > 65536 msgs via the sink flush(). Restarting the sink connector with a > forced reset to offset 0 (ie, replaying all the msgs on the topic) shows > that the messages are still missing (ie, no gaps in offsets), so I assume > the msgs must be lost by the producer ? > > > On Wed, Jun 15, 2016 at 1:29 AM, Radu Radutiu <rradu...@gmail.com> wrote: > >> Hi, >> >> I was following the Quickstart guide and I have noticed that >> ConsoleProducer does not publish all messages (the number of messages >> published differs from one run to another) and happens mostly on a fresh >> started broker. >> version: kafka_2.11-0.10.0.0 >> OS: Linux (Ubuntu 14.04, Centos 7.2) >> JDK: java version "1.7.0_101" >> OpenJDK Runtime Environment (IcedTea 2.6.6) >> (7u101-2.6.6-0ubuntu0.14.04.1), >> openjdk version "1.8.0_91" >> OpenJDK Runtime Environment (build 1.8.0_91-b14) >> >> >> How to reproduce: >> - start zookeeper: >> ~/work/kafka_2.11-0.10.0.0$ bin/zookeeper-server-start.sh >> config/zookeeper.properties & >> >> -start kafka: >> ~/work/kafka_2.11-0.10.0.0$ bin/kafka-server-start.sh >> config/server.properties & >> >> -start console consumer (topic test1 is already created): >> ~/work/kafka_2.11-0.10.0.0$ bin/kafka-console-consumer.sh >> --bootstrap-server localhost:9092 -topic test1 --zookeeper localhost:2181 >> >> -in another terminal start console producer with the LICENSE file in kafka >> directory as input: >> ~/work/kafka_2.11-0.10.0.0$ bin/kafka-console-producer.sh --topic test1 >> --broker-list localhost:9092 <LICENSE >> >> The last line in the console consumer output is not the last line in the >> LICENSE file for the first few runs of the console producer. If I use the >> --old-producer parameter, all the lines in the LICENSE file are published >> (and appear in the console consumer output). Different runs of console >> producer with the same input file publish different number of lines >> (sometimes all, sometimes only 182 lines out of 330). I've noticed that if >> the kafka server was started a long time ago the console producer >> publishes >> all lines. >> I have checked the kafka binary log file (in my case >> /tmp/kafka-logs/test1-0/00000000000000000000.log ) and confirmed that the >> messages are not published (the console consumer receives all the >> messages). >> >> Is there an explanation for this behavior? >> >> Best regards, >> Radu >> > >