The logic in that code is to cycle through all partitions and return as
soon as we see a partition with the leader. I do see an issue that if there
are multiple threads sending messages to the same producer concurrently, we
may not cycle through all partitions and therefore we could return an
unavailable partition even when available partitions are present.

Do you see this issue with just a single thread producing messages? The
current logic seems to work correctly in that case.

Thanks,

Jun

On Fri, Feb 20, 2015 at 12:45 PM, Xiaoyu Wang <xw...@rocketfuel.com> wrote:

> Found the problem - it is a bug with Partitions of kafka client. Can you
> guys confirm and patch in kafka clients?
>
> for (int i = 0; i < numPartitions; i++) {
>     int partition = Utils.abs(counter.getAndIncrement()) % numPartitions;
>     if (partitions.get(partition).leader() != null) {
>         return partitions.get(partition).partition();
>     }
> }
>
>
>
> On Fri, Feb 20, 2015 at 2:35 PM, Xiaoyu Wang <xw...@rocketfuel.com> wrote:
>
> > Update:
> >
> > I am using kafka.clients 0.8.2-beta.  Below are the test steps
> >
> >    1. setup local kafka clusters with 2 brokers, 0 and 1
> >    2. create topic X with replication fact 1 and 4 partitions
> >    3. verify that each broker has two partitions
> >    4. shutdown broker 1
> >    5. start a producer sending data to topic X using KafkaProducer with
> >    required ack = 1
> >    6. producer hangs and does not exit.
> >
> > Offline partitions were take care of when the partitions is null (code
> > attached below). However, the timeout setting does not seem to work. Not
> > sure what caused KafkaProducer to hang.
> >
> > // choose the next available node in a round-robin fashion
> > for (int i = 0; i < numPartitions; i++) {
> >     int partition = Utils.abs(counter.getAndIncrement()) % numPartitions;
> >     if (partitions.get(partition).leader() != null)
> >         return partition;
> > }
> > // no partitions are available, give a non-available partition
> > return Utils.abs(counter.getAndIncrement()) % numPartitions;
> >
> >
> >
> >
> >
> > On Fri, Feb 20, 2015 at 1:48 PM, Xiaoyu Wang <xw...@rocketfuel.com>
> wrote:
> >
> >> Hello,
> >>
> >> I am experimenting sending data to kafka using KafkaProducer and found
> >> that when a partition is completely offline, e.g. a topic with
> replication
> >> factor = 1 and some broker is down, KafkaProducer seems to be hanging
> >> forever. Not even exit with the timeout setting. Can you take a look?
> >>
> >> I checked code and found that the partitioner create partition based on
> >> the total partition number - including those offline partitions. Is it
> >> possible that we change ProducerClient to ignore offline partitions?
> >>
> >>
> >> Thanks,
> >>
> >> -Xiaoyu
> >>
> >>
> >
>

Reply via email to