Jun, I am trying to test how KafkaProducer behaves with topic replication factor = 1
1. One broker is offline BEFORE KafkaProducer starts sending messages. Because of the bug I mentioned, KafkaProducer sends to the offline partition and hangs forever. 2. One broker goes offline WHILE KafkaProducer is sending messages. KafkaProducer seems to be hanging forever in this case. I am still looking. Do you mind take a look? Thanks On Mon, Feb 23, 2015 at 7:01 PM, Jun Rao <j...@confluent.io> wrote: > The logic in that code is to cycle through all partitions and return as > soon as we see a partition with the leader. I do see an issue that if there > are multiple threads sending messages to the same producer concurrently, we > may not cycle through all partitions and therefore we could return an > unavailable partition even when available partitions are present. > > Do you see this issue with just a single thread producing messages? The > current logic seems to work correctly in that case. > > Thanks, > > Jun > > On Fri, Feb 20, 2015 at 12:45 PM, Xiaoyu Wang <xw...@rocketfuel.com> > wrote: > > > Found the problem - it is a bug with Partitions of kafka client. Can you > > guys confirm and patch in kafka clients? > > > > for (int i = 0; i < numPartitions; i++) { > > int partition = Utils.abs(counter.getAndIncrement()) % numPartitions; > > if (partitions.get(partition).leader() != null) { > > return partitions.get(partition).partition(); > > } > > } > > > > > > > > On Fri, Feb 20, 2015 at 2:35 PM, Xiaoyu Wang <xw...@rocketfuel.com> > wrote: > > > > > Update: > > > > > > I am using kafka.clients 0.8.2-beta. Below are the test steps > > > > > > 1. setup local kafka clusters with 2 brokers, 0 and 1 > > > 2. create topic X with replication fact 1 and 4 partitions > > > 3. verify that each broker has two partitions > > > 4. shutdown broker 1 > > > 5. start a producer sending data to topic X using KafkaProducer with > > > required ack = 1 > > > 6. producer hangs and does not exit. > > > > > > Offline partitions were take care of when the partitions is null (code > > > attached below). However, the timeout setting does not seem to work. > Not > > > sure what caused KafkaProducer to hang. > > > > > > // choose the next available node in a round-robin fashion > > > for (int i = 0; i < numPartitions; i++) { > > > int partition = Utils.abs(counter.getAndIncrement()) % > numPartitions; > > > if (partitions.get(partition).leader() != null) > > > return partition; > > > } > > > // no partitions are available, give a non-available partition > > > return Utils.abs(counter.getAndIncrement()) % numPartitions; > > > > > > > > > > > > > > > > > > On Fri, Feb 20, 2015 at 1:48 PM, Xiaoyu Wang <xw...@rocketfuel.com> > > wrote: > > > > > >> Hello, > > >> > > >> I am experimenting sending data to kafka using KafkaProducer and found > > >> that when a partition is completely offline, e.g. a topic with > > replication > > >> factor = 1 and some broker is down, KafkaProducer seems to be hanging > > >> forever. Not even exit with the timeout setting. Can you take a look? > > >> > > >> I checked code and found that the partitioner create partition based > on > > >> the total partition number - including those offline partitions. Is it > > >> possible that we change ProducerClient to ignore offline partitions? > > >> > > >> > > >> Thanks, > > >> > > >> -Xiaoyu > > >> > > >> > > > > > >