Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

Alexandre Rodrigues Thu, 13 Jun 2013 02:16:49 -0700

Hi Jun,

I was using the 0.8 branch with 2 commits behind but now I am using the
latest with the same issue. 3 topics A,B,C, created automatically with
replication factor of 2 and partitions 2. 2 brokers (0 and 1).


List of topics in zookeeper is the following:

topic: A  partition: 0    leader: 1       replicas: 1,0   isr: 1
topic: A  partition: 1    leader: 0       replicas: 0,1   isr: 0,1
topic: B partition: 0    leader: 0       replicas: 0,1   isr: 0,1
topic: B partition: 1    leader: 1       replicas: 1,0   isr: 1
topic: C      partition: 0    leader: 1       replicas: 1,0   isr: 1
topic: C      partition: 1    leader: 0       replicas: 0,1   isr: 0,1


*Broker 1*

This was the one I've started first. This works well and writes messages to
the disk.
In the state-change.log I have got no errors, just trace rows:

[2013-06-13 08:51:33,505] TRACE Broker 1 cached leader info
(LeaderAndIsrInfo:(Leader:0,ISR:0,1,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:0,1)
for partition [C,1] in response to UpdateMetadata request sent by
controller 1 epoch 1 with correlation id 10 (state.change.logger)
[2013-06-13 08:51:33,506] TRACE Controller 1 epoch 1 received response
correlationId 10 for a request sent to broker 1 (state.change.logger)
[2013-06-13 08:51:33,509] TRACE Controller 1 epoch 1 changed state of
replica 0 for partition [C,1] to OnlineReplica (state.change.logger)
[2013-06-13 08:51:33,510] TRACE Controller 1 epoch 1 changed state of
replica 1 for partition [C,0] to OnlineReplica (state.change.logger)
[2013-06-13 08:51:33,511] TRACE Controller 1 epoch 1 changed state of
replica 0 for partition [B,1] to OnlineReplica (state.change.logger)
[2013-06-13 08:51:33,511] TRACE Controller 1 epoch 1 changed state of
replica 0 for partition [C,0] to OnlineReplica (state.change.logger)
[2013-06-13 08:51:33,512] TRACE Controller 1 epoch 1 changed state of
replica 0 for partition [B,0] to OnlineReplica (state.change.logger)
[2013-06-13 08:51:33,512] TRACE Controller 1 epoch 1 changed state of
replica 1 for partition [B,0] to OnlineReplica (state.change.logger)
[2013-06-13 08:51:33,513] TRACE Controller 1 epoch 1 changed state of
replica 1 for partition [B,1] to OnlineReplica (state.change.logger)
[2013-06-13 08:51:33,513] TRACE Controller 1 epoch 1 changed state of
replica 1 for partition [C,1] to OnlineReplica (state.change.logger)

$ du -sh /mnt/kafka-logs/*

4.0K    /mnt/kafka-logs/replication-offset-checkpoint
163M    /mnt/kafka-logs/A-0
4.0K    /mnt/kafka-logs/A-1
4.0K    /mnt/kafka-logs/B-0
90M     /mnt/kafka-logs/B-1
16K     /mnt/kafka-logs/C-0
4.0K    /mnt/kafka-logs/C-1



*Broker 0*
*
*
Configuration is the same as Broker #1, with different broker.id. This
doesn't write to the disk. The /mnt/kafka-logs is empty without any file.

Logging a non-stopping stream of:

[2013-06-13 09:08:53,814] WARN [KafkaApi-0] Produce request with
correlation id 735114 from client  on partition [A,1] failed due to
Partition [request,1] doesn't exist on 0 (kafka.server.KafkaApis)
[2013-06-13 09:08:53,815] WARN [KafkaApi-0] Produce request with
correlation id 519064 from client  on partition [B,0] failed due to
Partition [response,0] doesn't exist on 0 (kafka.server.KafkaApis)
[2013-06-13 09:08:53,815] WARN [KafkaApi-0] Produce request with
correlation id 735118 from client  on partition [A,1] failed due to
Partition [request,1] doesn't exist on 0 (kafka.server.KafkaApis)
[2013-06-13 09:08:53,815] WARN [KafkaApi-0] Produce request with
correlation id 519068 from client  on partition [B,0] failed due to
Partition [response,0] doesn't exist on 0 (kafka.server.KafkaApis)
...

*Server Configuration *
*
*
port=9092
num.network.threads=2
num.io.threads=2
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.dir=/mnt/kafka-logs
auto.create.topics.enable=true
default.replication.factor=2
num.partitions=2
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=168
log.segment.bytes=536870912
log.cleanup.interval.mins=1
zookeeper.connect=xxx1:2181,xxx2:2181,xxx3:2181
zookeeper.connection.timeout.ms=1000000
kafka.metrics.polling.interval.secs=5
kafka.metrics.reporters=kafka.metrics.KafkaCSVMetricsReporter
kafka.csv.metrics.dir=/mnt/kafka_metrics
kafka.csv.metrics.reporter.enabled=false


I can't understand why doesn't broker0 doesn't act like a leader in their
partitions nor receive replicated data from the broker1. To eliminate the
possibility of the problem being from the producer, I will run similar
tests with the console producer.

Alex


On 13 June 2013 04:57, Jun Rao <jun...@gmail.com> wrote:

> Any error in state-change.log? Also, are you using the latest code in the
> 0.8 branch?
>
> Thanks,
>
> Jun
>
>
> On Wed, Jun 12, 2013 at 9:27 AM, Alexandre Rodrigues <
> alexan...@blismedia.com> wrote:
>
> > Hi Jun,
> >
> > Thanks for your prompt answer. The producer yields those errors in the
> > beginning, so I think the topic metadata refresh has nothing to do with
> it.
> >
> > The problem is one of the brokers isn't leader on any partition assigned
> to
> > it and because topics were created with a replication factor of 1, the
> > producer will never connect to that broker at all. What I don't
> understand
> > is why doesn't the broker assume the lead of those partitions.
> >
> > I deleted all the topics and tried now with a replication factor of two
> >
> > topic: A  partition: 0    leader: 1       replicas: 1,0   isr: 1
> > topic: A  partition: 1    leader: 0       replicas: 0,1   isr: 0,1
> > topic: B partition: 0    leader: 0       replicas: 0,1   isr: 0,1
> > topic: B partition: 1    leader: 1       replicas: 1,0   isr: 1
> > topic: C      partition: 0    leader: 1       replicas: 1,0   isr: 1
> > topic: C      partition: 1    leader: 0       replicas: 0,1   isr: 0,1
> >
> >
> > Now producer doesn't yield errors. However, one of the brokers ( broker
> 0 )
> > generates lots of lines like this:
> >
> > [2013-06-12 16:19:41,805] WARN [KafkaApi-0] Produce request with
> > correlation id 404999 from client  on partition [B,0] failed due to
> > Partition [B,0] doesn't exist on 0 (kafka.server.KafkaApis)
> >
> > There should be a replica there, so I don't know why it complains about
> > that message.
> >
> > Have you ever found anything like this?
> >
> >
> >
> > On 12 June 2013 16:27, Jun Rao <jun...@gmail.com> wrote:
> >
> > > If the leaders exist in both brokers, the producer should be able to
> > > connect to both of them, assuming you don't provide any key when
> sending
> > > the data. Could you try restarting the producer? If there has been
> broker
> > > failures, it may take topic.metadata.refresh.interval.ms for the
> > producer
> > > to pick up the newly available partitions (see
> > > http://kafka.apache.org/08/configuration.html for details).
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jun 12, 2013 at 8:01 AM, Alexandre Rodrigues <
> > > alexan...@blismedia.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a Kafka 0.8 cluster with two nodes connected to three ZKs,
> with
> > > the
> > > > same configuration but the brokerId (one is 0 and the other 1). I
> > created
> > > > three topics A, B and C with 4 partitions and a replication factor of
> > 1.
> > > My
> > > > idea was to have 2 partitions per topic in each broker. However,
> when I
> > > > connect a producer, I can't have both brokers to write at the same
> time
> > > and
> > > > I don't know what's going on.
> > > >
> > > > My server.config has the following entries:
> > > >
> > > > auto.create.topics.enable=true
> > > > num.partitions=2
> > > >
> > > >
> > > > When I run bin/kafka-list-topic.sh --zookeeper localhost:2181   I get
> > the
> > > > following partition leader assignments:
> > > >
> > > > topic: A  partition: 0    leader: 1       replicas: 1     isr: 1
> > > > topic: A  partition: 1    leader: 0       replicas: 0     isr: 0
> > > > topic: A  partition: 2    leader: 1       replicas: 1     isr: 1
> > > > topic: A  partition: 3    leader: 0       replicas: 0     isr: 0
> > > > topic: B partition: 0    leader: 0       replicas: 0     isr: 0
> > > > topic: B partition: 1    leader: 1       replicas: 1     isr: 1
> > > > topic: B partition: 2    leader: 0       replicas: 0     isr: 0
> > > > topic: B partition: 3    leader: 1       replicas: 1     isr: 1
> > > > topic: C      partition: 0    leader: 0       replicas: 0     isr: 0
> > > > topic: C      partition: 1    leader: 1       replicas: 1     isr: 1
> > > > topic: C      partition: 2    leader: 0       replicas: 0     isr: 0
> > > > topic: C      partition: 3    leader: 1       replicas: 1     isr: 1
> > > >
> > > >
> > > > I've forced reassignment using the kafka-reassign-partitions tool
> with
> > > the
> > > > following JSON:
> > > >
> > > > {"partitions":  [
> > > >    {"topic": "A", "partition": 1, "replicas": [0] },
> > > >    {"topic": "A", "partition": 3, "replicas": [0] },
> > > >    {"topic": "A", "partition": 0, "replicas": [1] },
> > > >    {"topic": "A", "partition": 2, "replicas": [1] },
> > > >    {"topic": "B", "partition": 1, "replicas": [0] },
> > > >    {"topic": "B", "partition": 3, "replicas": [0] },
> > > >    {"topic": "B", "partition": 0, "replicas": [1] },
> > > >    {"topic": "B", "partition": 2, "replicas": [1] },
> > > >    {"topic": "C", "partition": 0, "replicas": [0] },
> > > >    {"topic": "C", "partition": 1, "replicas": [1] },
> > > >    {"topic": "C", "partition": 2, "replicas": [0] },
> > > >    {"topic": "C", "partition": 3, "replicas": [1] }
> > > > ]}
> > > >
> > > > After reassignment, I've restarted producer and nothing worked. I've
> > > tried
> > > > also to restart both brokers and producer and nothing.
> > > >
> > > > The producer contains this logs:
> > > >
> > > > 2013-06-12 14:48:46,467] WARN Error while fetching metadata
> >  partition
> > > 0
> > > >     leader: none    replicas:       isr:    isUnderReplicated: false
> > for
> > > > topic partition [C,0]: [class
> kafka.common.LeaderNotAvailableException]
> > > > (kafka.producer.BrokerPartitionInfo)
> > > > [2013-06-12 14:48:46,467] WARN Error while fetching metadata
> > >  partition 0
> > > >     leader: none    replicas:       isr:    isUnderReplicated: false
> > for
> > > > topic partition [C,0]: [class
> kafka.common.LeaderNotAvailableException]
> > > > (kafka.producer.BrokerPartitionInfo)
> > > > [2013-06-12 14:48:46,468] WARN Error while fetching metadata
> > >  partition 2
> > > >     leader: none    replicas:       isr:    isUnderReplicated: false
> > for
> > > > topic partition [C,2]: [class
> kafka.common.LeaderNotAvailableException]
> > > > (kafka.producer.BrokerPartitionInfo)
> > > > [2013-06-12 14:48:46,468] WARN Error while fetching metadata
> > >  partition 2
> > > >     leader: none    replicas:       isr:    isUnderReplicated: false
> > for
> > > > topic partition [C,2]: [class
> kafka.common.LeaderNotAvailableException]
> > > > (kafka.producer.BrokerPartitionInfo)
> > > >
> > > >
> > > > And sometimes lines like this:
> > > >
> > > > [2013-06-12 14:55:37,339] WARN Error while fetching metadata
> > > > [{TopicMetadata for topic B ->
> > > > No partition metadata for topic B due to
> > > > kafka.common.LeaderNotAvailableException}] for topic [B]: class
> > > > kafka.common.LeaderNotAvailableException
> > > >  (kafka.producer.BrokerPartitionInfo)
> > > >
> > > >
> > > > Do you guys have any idea what's going on?
> > > >
> > > > Thanks in advance,
> > > > Alex
> > > >
> > > > --
> > > >
> > > > @BlisMedia <http://twitter.com/BlisMedia>
> > > >
> > > > www.blismedia.com <http://blismedia.com>
> > > >
> > > > This email and any attachments to it may be confidential and are
> > intended
> > > > solely
> > > > for the use of the individual to whom it is addressed. Any views or
> > > > opinions
> > > > expressed are solely those of the author and do not necessarily
> > represent
> > > > those of BlisMedia Ltd, a company registered in England and Wales
> with
> > > > registered number 06455773. Its registered office is 3rd Floor, 101
> New
> > > > Cavendish St, London, W1W 6XH, United Kingdom.
> > > >
> > > > If you are not the intended recipient of this email, you must neither
> > > take
> > > > any action based upon its contents, nor copy or show it to anyone.
> > Please
> > > > contact the sender if you believe you have received this email in
> > error.
> > > >
> > >
> >
> > --
> >
> > @BlisMedia <http://twitter.com/BlisMedia>
> >
> > www.blismedia.com <http://blismedia.com>
> >
> > This email and any attachments to it may be confidential and are intended
> > solely
> > for the use of the individual to whom it is addressed. Any views or
> > opinions
> > expressed are solely those of the author and do not necessarily represent
> > those of BlisMedia Ltd, a company registered in England and Wales with
> > registered number 06455773. Its registered office is 3rd Floor, 101 New
> > Cavendish St, London, W1W 6XH, United Kingdom.
> >
> > If you are not the intended recipient of this email, you must neither
> take
> > any action based upon its contents, nor copy or show it to anyone. Please
> > contact the sender if you believe you have received this email in error.
> >
>

-- 

@BlisMedia <http://twitter.com/BlisMedia>

www.blismedia.com <http://blismedia.com>

This email and any attachments to it may be confidential and are intended 
solely 
for the use of the individual to whom it is addressed. Any views or opinions 
expressed are solely those of the author and do not necessarily represent 
those of BlisMedia Ltd, a company registered in England and Wales with 
registered number 06455773. Its registered office is 3rd Floor, 101 New 
Cavendish St, London, W1W 6XH, United Kingdom.

If you are not the intended recipient of this email, you must neither take 
any action based upon its contents, nor copy or show it to anyone. Please 
contact the sender if you believe you have received this email in error.

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

Reply via email to