This is because we populate the key in ReplicaManager.highWatermarkCheckpoints using the "dirs" config, but look up the key using log.dir.getParent. So, if you have a trailing slash in the config, they won't match. This seems a bug that we should fix. Could you file a jira?
Thanks, Jun On Tue, Jul 30, 2013 at 9:18 AM, Maxime Petazzoni <maxime.petazz...@turn.com > wrote: > Nope, that's on a pretty standard GNU/Linux Debian system (jessie/sid) > running a 3.9.8-1 kernel. But you were onto something. Removing the > trailing slash is my log.dir config value made it work. > > I'm not sure why this would have an impact since the log directories seem > to be correctly parsed as File objects in LogManager.scala: > > val logDirs: Array[File] = config.logDirs.map(new File(_)).toArray > > And in both cases the server logs reports that the log for 'test-0' was > correctly loaded, which means that log directory is also correctly inserted > into the logs Pool[TopicAndPartition, Log]. > > That's about as far as my Scala knowledge goes though ;-) Let me know if > you're able to reproduce the problem when you have a trailing slash as well. > > Thanks, > /Maxime > -- > Maxime Petazzoni > Sr. Platform Engineer > m 408.310.0595 > www.turn.com > > ________________________________________ > From: Jun Rao [jun...@gmail.com] > Sent: Monday, July 29, 2013 9:32 PM > To: users@kafka.apache.org > Subject: Re: Leader not local for partition error? > > Are you on Windows? We have seen issues like that before on Windows. You > may have to use "/" when configuring "log.dirs". > > Thanks, > > Jun > > > On Mon, Jul 29, 2013 at 4:50 PM, Maxime Petazzoni < > maxime.petazz...@turn.com > > wrote: > > > Same issue with the 0.8 beta1 tarball. There is something interesting in > > state-change.log though: > > > > [2013-07-29 16:47:26,708] TRACE Broker 0 received LeaderAndIsr request > > correlationId 6 from controller 0 epoch 1 starting the become-leader > > transition for partition [test,0] (state.change.logger) > > [2013-07-29 16:47:26,736] ERROR Error on broker 0 while processing > > LeaderAndIsr request correlationId 6 received from controller 0 epoch 1 > for > > partition (test,0) (state.change.logger) > > java.util.NoSuchElementException: key not found: > > /home/maxime/opt/kafka/data/kafka > > at scala.collection.MapLike$class.default(MapLike.scala:223) > > at scala.collection.immutable.Map$Map1.default(Map.scala:93) > > at scala.collection.MapLike$class.apply(MapLike.scala:134) > > at scala.collection.immutable.Map$Map1.apply(Map.scala:93) > > at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:83) > > ... > > > > I have log.dir=/home/maxime/opt/kafka/data/kafka/ in server.properties. > > That directory obviously exists after Kafka starts, and contains: > > > > find /home/maxime/opt/kafka/data/kafka/ > > /home/maxime/opt/kafka/data/kafka/ > > /home/maxime/opt/kafka/data/kafka/test-0 > > /home/maxime/opt/kafka/data/kafka/test-0/00000000000000000000.log > > /home/maxime/opt/kafka/data/kafka/test-0/00000000000000000000.index > > /home/maxime/opt/kafka/data/kafka/.lock > > > > Which is expected, given I have a single 'test' topic with a single > > partition. > > > > Any ideas? Can you reproduce the problem on your end with a freshly > > extracted tarball? > > > > Thanks, > > /Max > > -- > > Maxime Petazzoni > > Sr. Platform Engineer > > m 408.310.0595 > > www.turn.com > > > > ________________________________________ > > From: Jun Rao [jun...@gmail.com] > > Sent: Sunday, July 21, 2013 9:38 PM > > To: users@kafka.apache.org > > Subject: Re: Leader not local for partition error? > > > > Any error/exception in state-change or controller log? Also, could you > try > > the 0.8 beta1 release? > > > > Thanks, > > > > Jun > > > > > > On Mon, Jul 15, 2013 at 1:36 PM, Maxime Petazzoni < > > maxime.petazz...@turn.com > > > wrote: > > > > > Hi all, > > > > > > I'm not sure if I'm doing something wrong or if I missed a step > > > somewhere. A little while ago I successfully got the 0.8 quickstart > > > example to work fine with the console producer/consumer. Then I went to > > > work on some code to learn how to implement a producer, which failed > > > with the producer not being able to send anything with the following > > > error in the logs: > > > > > > Produce request with correlation id 11 failed due to [test,0]: > > > kafka.common.NotLeaderForPartitionException > > > > > > So I went back to trying the console producer, and I'm getting the same > > > error. To be sure, I removed all generated data by ZooKeeper and Kafka > > > and re-followed the steps of the quickstart guide, but I'm getting the > > > same error with the console producer/consumer. > > > > > > kafka-list-topic.sh correctly lists my 1-partition, 1-replica test > > > topic: > > > > > > topic: test partition: 0 leader: 0 replicas: 0 isr: 0 > > > > > > ZK and the broker are of course both up and running. Starting the > > > producer nothing out of the ordinary happens, but when starting the > > > consumer (before attempting to send anything), I get the following > > > exception: > > > > > > [2013-07-15 13:25:46,487] INFO > > > > > > [ConsumerFetcherThread-console-consumer-943_polygon-1373919946074-f478ba53-0-0], > > > Starting (kafka.consumer.ConsumerFetcherThread) > > > [2013-07-15 13:25:46,517] ERROR > > > > > > [console-consumer-943_polygon-1373919946074-f478ba53-leader-finder-thread], > > > Error due to > (kafka.consumer.ConsumerFetcherManager$LeaderFinderThread) > > > kafka.common.NotLeaderForPartitionException > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > > Method) > > > > > > I'm at a loss at what's going on here. When the broker starts it > clearly > > > goes through the election process and becomes the leader (since it's > the > > > only broker anyway) for the 'test' topic: > > > > > > [2013-07-15 13:33:05,345] INFO 0 successfully elected as leader > > > (kafka.server.ZookeeperLeaderElector) > > > [2013-07-15 13:33:05,550] INFO [Replica Manager on Broker 0]: > Handling > > > LeaderAndIsr request > > > > > > Name:LeaderAndIsrRequest;Version:0;Controller:0;ControllerEpoch:3;CorrelationId:0;ClientId:id_0-host_null-port_9092;PartitionState:(test,0) > > > -> > > > > > > (LeaderAndIsrInfo:(Leader:0,ISR:0,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:1),AllReplicas:0);Leaders:id:0,host: > > > polygon.turn.com,port:9092 (kafka.server.ReplicaManager) > > > [2013-07-15 13:33:05,551] INFO New leader is 0 > > > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) > > > [2013-07-15 13:33:05,563] INFO [Kafka Server 0], Started > > > (kafka.server.KafkaServer) > > > [2013-07-15 13:33:05,563] INFO [ReplicaFetcherManager on broker 0] > > > Removing fetcher for partition [test,0] > > (kafka.server.ReplicaFetcherManager) > > > [2013-07-15 13:33:05,566] INFO [Replica Manager on Broker 0]: Handled > > > leader and isr request > > > > > > Name:LeaderAndIsrRequest;Version:0;Controller:0;ControllerEpoch:3;CorrelationId:0;ClientId:id_0-host_null-port_9092;PartitionState:(test,0) > > > -> > > > > > > (LeaderAndIsrInfo:(Leader:0,ISR:0,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:1),AllReplicas:0);Leaders:id:0,host: > > > polygon.turn.com,port:9092 (kafka.server.ReplicaManager) > > > > > > I'm running Kafka from branch 0.8 (b1891e7). Any idea what's going on > > > there? > > > > > > Thanks, > > > /Max > > > > > > -- > > > Maxime Petazzoni > > > Sr. Platform Engineer > > > m 408.310.0595 > > > www.turn.com > > > > > >