Thanks Phil. I added the information to the JIRA. This needs to be
verified, but it may be possible that the fix for KAFKA-3306 will also fix
this as we won't be requesting all the topics any longer.

Ismael

On Wed, Apr 13, 2016 at 10:35 AM, Phil Luckhurst <phil.luckhu...@encycle.com
> wrote:

> Hi Ismael,
>
> The request does succeed and the reason it keeps requesting is a check in
> the Sender.run(long now) method.
>
>     public void run(long now) {
>         Cluster cluster = metadata.fetch();
>         // get the list of partitions with data ready to send
>         RecordAccumulator.ReadyCheckResult result =
> this.accumulator.ready(cluster, now);
>
>         // if there are any partitions whose leaders are not known yet,
> force metadata update
>         if (result.unknownLeadersExist)
>             this.metadata.requestUpdate();
>
> It looks like the this.accumulator.ready(cluster, now) method checks the
> leader for each partition in the response against what it already had. In
> this case the original metadata request had the empty topic list so got
> information for all partitions but after using the producer the cluster
> only has the one topic in it which means this check sets
> unknownLeadersExist = true.
>             Node leader = cluster.leaderFor(part);
>             if (leader == null) {
>                 unknownLeadersExist = true;
>
> As you can see above the Sender.run methd checks for this in the result
> and then calls this.metadata.requestUpdate() which tirggers the metadata to
> be requested again. And of course the same thing happens when checking the
> next response and we're suddenly in the loop forever.
>
> Hope this helps,
>
> Phil
>
>
> -----Original Message-----
> From: isma...@gmail.com [mailto:isma...@gmail.com] On Behalf Of Ismael
> Juma
> Sent: 12 April 2016 16:24
> To: users@kafka.apache.org
> Subject: Re: KafkaProducer 0.9.0.1 continually sends metadata requests
>
> Hi Jonathan,
>
> That makes sense, thanks.
>
> Phil, do you know if your metadata requests are succeeding? Retrying every
> retry.backoff.ms is expected if they fail, but not if they succeed. In
> terms of posting a comment to JIRA, you can create an account if you don't
> have one:
>
> https://issues.apache.org/jira/secure/Signup!default.jspa
>
> Ismael
>
> On Tue, Apr 12, 2016 at 3:33 PM, Jonathan Bond <jb...@netflix.com.invalid>
> wrote:
>
> > Ismael, In our case KAFKA-3358 caused frequent metadata requests
> > because we ended up swamping the broker.  Getting all metadata is a
> > very expensive request, once all requests threads spent their majority
> > of time handling metadata requests, they'd start timing out. Timing
> > out would cause a new request to be made, which would increase the
> > needed metadata level - so when a response finally did arrive it was
> > never recent enough. To resolve we had to shut down the 7000 clients
> > and let the brokers catch-up, for other messages to start flowing
> > again. I can't remember if clients were sending at retry.backoff.ms or
> > timeout rate (it wasn't a fun day, and I don't want to recreate to
> > find out). We now make sure we don't start up producers until a message
> is ready to be sent.
> >
> > On Tue, Apr 12, 2016 at 6:59 AM Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Sorry, I should say that KAFKA-3306 will fix the issue where we
> > > request data for all topics instead of no topics. However, it seems
> > > like there is an additional issue where it seems like we are
> > > performing metadata
> > requests
> > > too frequently. Phil, can you please add this information to
> > > KAFKA-3358
> > so
> > > that we can figure out if more needs to be done?
> > >
> > > Ismael
> > >
> > > On Tue, Apr 12, 2016 at 2:52 PM, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > >
> > > > Note that this should be fixed as part of
> > > > https://issues.apache.org/jira/browse/KAFKA-3306
> > > >
> > > > Ismael
> > > >
> > > > On Tue, Apr 12, 2016 at 2:17 PM, Phil Luckhurst <
> > > > phil.luckhu...@encycle.com> wrote:
> > > >
> > > >> Thanks Jonathan, I didn't spot that JIRA.
> > > >>
> > > >> Phil
> > > >>
> > > >> -----Original Message-----
> > > >> From: Jonathan Bond [mailto:jb...@netflix.com.INVALID]
> > > >> Sent: 12 April 2016 14:08
> > > >> To: users@kafka.apache.org
> > > >> Subject: Re: KafkaProducer 0.9.0.1 continually sends metadata
> > > >> requests
> > > >>
> > > >> Phil,
> > > >> In our case this bug placed significant load on our brokers. We
> > raised a
> > > >> bug  https://issues.apache.org/jira/browse/KAFKA-3358 to get this
> > > >> resolved.
> > > >>
> > > >> On Tue, Apr 12, 2016 at 5:39 AM Phil Luckhurst <
> > > >> phil.luckhu...@encycle.com>
> > > >> wrote:
> > > >>
> > > >> > With debug logging turned on we've sometimes seen our logs
> > > >> > filling
> > up
> > > >> > with the kafka producer sending metadata requests every 100ms e.g.
> > > >> >
> > > >> > 2016-04-08 10:39:33,592 DEBUG [kafka-producer-network-thread |
> > > >> > phil-pa-1]
> > > >> > org.apache.kafka.clients.NetworkClient: Sending metadata
> > > >> > request ClientRequest(expectResponse=true, callback=null,
> > > >> >
> > request=RequestSend(header={api_key=3,api_version=0,correlation_id=249
> > > >> > ,client_id=phil-pa-1},
> > > >> > body={topics=[phil-pa-1-device-update]}),
> > > >> > isInitiatedByNetworkClient, createdTimeMs=1460108373592,
> > sendTimeMs=0)
> > > >> > to node 0
> > > >> > 2016-04-08 10:39:33,592 DEBUG [kafka-producer-network-thread |
> > > >> > phil-pa-1]
> > > >> > org.apache.kafka.clients.Metadata: Updated cluster metadata
> > > >> > version
> > > >> > 248 to Cluster(nodes = [Node(0, ta-eng-kafka2, 9092)],
> > > >> > partitions = [Partition(topic = phil-pa-1-device-update,
> > > >> > partition = 0, leader =
> > 0,
> > > >> > replicas = [0,], isr = [0,]])
> > > >> > 2016-04-08 10:39:33,698 DEBUG [kafka-producer-network-thread |
> > > >> > phil-pa-1]
> > > >> > org.apache.kafka.clients.NetworkClient: Sending metadata
> > > >> > request ClientRequest(expectResponse=true, callback=null,
> > > >> >
> > request=RequestSend(header={api_key=3,api_version=0,correlation_id=250
> > > >> > ,client_id=phil-pa-1},
> > > >> > body={topics=[phil-pa-1-device-update]}),
> > > >> > isInitiatedByNetworkClient, createdTimeMs=1460108373698,
> > sendTimeMs=0)
> > > >> > to node 0
> > > >> > 2016-04-08 10:39:33,698 DEBUG [kafka-producer-network-thread |
> > > >> > phil-pa-1]
> > > >> > org.apache.kafka.clients.Metadata: Updated cluster metadata
> > > >> > version
> > > >> > 249 to Cluster(nodes = [Node(0, ta-eng-kafka2, 9092)],
> > > >> > partitions = [Partition(topic = phil-pa-1-device-update,
> > > >> > partition = 0, leader =
> > 0,
> > > >> > replicas = [0,], isr = [0,]])
> > > >> >
> > > >> > These metadata requests continue to be sent every 100ms
> > > >> > (retry.backoff.ms) until we stop the process.
> > > >> >
> > > >> > This only seems to happen if the KafkaProducer instance is
> > > >> > created
> > but
> > > >> > not used to publish a message for 5 minutes. After 5 minutes (
> > > >> > metadata.max.age.ms) the producer thread sends a metadata
> > > >> > request
> > to
> > > >> > the server that has an empty topics list and the server
> > > >> > responds
> > with
> > > >> > the partition information for *all* topics hosted on the server.
> > > >> >
> > > >> > 2016-04-11 14:16:39,320 DEBUG [kafka-producer-network-thread |
> > > >> > phil-pa-1]
> > > >> > org.apache.kafka.clients.NetworkClient: Sending metadata
> > > >> > request ClientRequest(expectResponse=true, callback=null,
> > > >> >
> > request=RequestSend(header={api_key=3,api_version=0,correlation_id=0,c
> > > >> > lient_id=phil-pa-1}, body={topics=[]}),
> > > >> > isInitiatedByNetworkClient, createdTimeMs=1460380599289,
> > > >> > sendTimeMs=0) to node -1
> > > >> >
> > > >> > If we then use that KafkaProducer instance to send a message
> > > >> > the
> > next
> > > >> > 'Sending meta request' will just be for the topic we have sent
> > > >> > the message to and this then triggers the flood of retry
> > > >> > requests as
> > noted
> > > >> above.
> > > >> >
> > > >> > If we ensure we send the first message within the time set by
> > > >> > metadata.max.age.ms (default 5 minutes) then everything works
> > > >> > as expected and the metadata requests do not continually get
> retried.
> > > >> >
> > > >> > In many cases I can understand that creating a KafkaProducer
> > > >> > and
> > then
> > > >> > not using it within 5 minutes is not usual but in our case
> > > >> > we're creating it when our REST based application starts up and
> > > >> > we can't guarantee that a message will be published within that
> > > >> > time. To get around this we are currently posting a test
> > > >> > message to the topic
> > right
> > > >> > after creating the KafkaProducer prevents it happening.
> > > >> >
> > > >> > I'm not sure if this is worthy of a fix but I thought I'd post
> > > >> > it
> > here
> > > >> > in case someone else hits the same problem.
> > > >> >
> > > >> > Regards,
> > > >> > Phil Luckhurst
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to