Hi: I’ve been a quite fan of Kafka / Samza for about a year now. The project I’ve been working for TripAdvisor is about to soft launch soon. Lately we’ve noticed replication error coming from Kafka (0.8.2.0). Samza is exclusively using this Kafka cluster, minus a small application that’s feeding it data. The version of Samza is 0.9.1.
It feels like we’re stumbling on the same issue described here: https://issues.apache.org/jira/browse/KAFKA-2143. I attempted to apply the patch but it was written for a version of Kafka greater than then one we’re using (and recommended by Samza). Here’s an example: 2015-12-18 02:31:35,573 ERROR kafka.server.ReplicaManager: [Replica Manager on Broker 230]: Error when processing fetch request for partition [Stream_<label>_Prod,9] offset 78214693 from consumer with correlation id 114263. Possible cause: Request for offset 78214693 but we only have log segments in the range 79642852 to 128218996. We don’t see any errors in the Samza logs till much later. 2015-12-18 04:56:35 ERROR consumer.ConsumerIterator:97 - consumed offset: 76451107 doesn't match fetch offset: 76741109 for Stream_<label>_Prod:6: fetched offset = 76741629: consumed offset = 76451107; Consumer may lose data We have 3 Kafka nodes and 5 samza nodes. Topics have a 2X replication. There are a total of 60 topics. 10 of these topics carry the bulk of the data. I’m charting Kafka’s under replicated partitions graph and it show only 4 partitions max are under replicated. This happens briefly when we’re flushing out stale data. Ironically, the replication errors seem to happen when the system is under low activity. Our application is very bursty. Our front end servers roll logs every hour. Yesterday for example we were replaying old logs at full speed for 8 hours without issues. Once we caught up and the application went into "burst mode” then we started to see the errors. Has anyone seen issues like this? I checked the samza mailing list for the past 3 months and didn’t see anything indicating it. I’m hoping it’s the way we configured Kafka? Our settings are below. Thanks in advance for any help. Regards, Louis Calisi auto.create.topics.enable=true auto.leader.rebalance.enable=true broker.id=230 controlled.shutdown.enable=true default.replication.factor=2 delete.topic.enable=true jmx_port=XXXX kafka.http.metrics.host=0.0.0.0 kafka.http.metrics.port=XXXXX kafka.log4j.dir=/var/log/kafka log.dirs=/data/2/kafka/data,/data/3/kafka/data,/data/4/kafka/data,/data/5/kafka/data,/data/6/kafka/data,/data/7/kafka/data,/data/8/kafka/data,/data/9/kafka/data,/data/10/kafka/data,/data/11/kafka/data,/data/12/kafka/data log.retention.bytes=-1 log.retention.hours=12 log.roll.hours=24 log.segment.bytes=1073741824 max.connections.per.ip=400 message.max.bytes=1000000 min.insync.replicas=2 num.partitions=1 port=XXXX replica.fetch.max.bytes=1048576 replica.lag.max.messages=4000 unclean.leader.election.enable=false zookeeper.session.timeout.ms=30000 num.io.threads=12 num.network.threads=8 log.cleaner.enable=true log.cleaner.threads=3 offsets.storage=kafka dual.commit.enabled=false zookeeper.connect=….. kafka.metrics.reporters=nl.techop.kafka.KafkaHttpMetricsReporter