Thanks Phil. I added the information to the JIRA. This needs to be verified, but it may be possible that the fix for KAFKA-3306 will also fix this as we won't be requesting all the topics any longer.
Ismael On Wed, Apr 13, 2016 at 10:35 AM, Phil Luckhurst <phil.luckhu...@encycle.com > wrote: > Hi Ismael, > > The request does succeed and the reason it keeps requesting is a check in > the Sender.run(long now) method. > > public void run(long now) { > Cluster cluster = metadata.fetch(); > // get the list of partitions with data ready to send > RecordAccumulator.ReadyCheckResult result = > this.accumulator.ready(cluster, now); > > // if there are any partitions whose leaders are not known yet, > force metadata update > if (result.unknownLeadersExist) > this.metadata.requestUpdate(); > > It looks like the this.accumulator.ready(cluster, now) method checks the > leader for each partition in the response against what it already had. In > this case the original metadata request had the empty topic list so got > information for all partitions but after using the producer the cluster > only has the one topic in it which means this check sets > unknownLeadersExist = true. > Node leader = cluster.leaderFor(part); > if (leader == null) { > unknownLeadersExist = true; > > As you can see above the Sender.run methd checks for this in the result > and then calls this.metadata.requestUpdate() which tirggers the metadata to > be requested again. And of course the same thing happens when checking the > next response and we're suddenly in the loop forever. > > Hope this helps, > > Phil > > > -----Original Message----- > From: isma...@gmail.com [mailto:isma...@gmail.com] On Behalf Of Ismael > Juma > Sent: 12 April 2016 16:24 > To: users@kafka.apache.org > Subject: Re: KafkaProducer 0.9.0.1 continually sends metadata requests > > Hi Jonathan, > > That makes sense, thanks. > > Phil, do you know if your metadata requests are succeeding? Retrying every > retry.backoff.ms is expected if they fail, but not if they succeed. In > terms of posting a comment to JIRA, you can create an account if you don't > have one: > > https://issues.apache.org/jira/secure/Signup!default.jspa > > Ismael > > On Tue, Apr 12, 2016 at 3:33 PM, Jonathan Bond <jb...@netflix.com.invalid> > wrote: > > > Ismael, In our case KAFKA-3358 caused frequent metadata requests > > because we ended up swamping the broker. Getting all metadata is a > > very expensive request, once all requests threads spent their majority > > of time handling metadata requests, they'd start timing out. Timing > > out would cause a new request to be made, which would increase the > > needed metadata level - so when a response finally did arrive it was > > never recent enough. To resolve we had to shut down the 7000 clients > > and let the brokers catch-up, for other messages to start flowing > > again. I can't remember if clients were sending at retry.backoff.ms or > > timeout rate (it wasn't a fun day, and I don't want to recreate to > > find out). We now make sure we don't start up producers until a message > is ready to be sent. > > > > On Tue, Apr 12, 2016 at 6:59 AM Ismael Juma <ism...@juma.me.uk> wrote: > > > > > Sorry, I should say that KAFKA-3306 will fix the issue where we > > > request data for all topics instead of no topics. However, it seems > > > like there is an additional issue where it seems like we are > > > performing metadata > > requests > > > too frequently. Phil, can you please add this information to > > > KAFKA-3358 > > so > > > that we can figure out if more needs to be done? > > > > > > Ismael > > > > > > On Tue, Apr 12, 2016 at 2:52 PM, Ismael Juma <ism...@juma.me.uk> > wrote: > > > > > > > Note that this should be fixed as part of > > > > https://issues.apache.org/jira/browse/KAFKA-3306 > > > > > > > > Ismael > > > > > > > > On Tue, Apr 12, 2016 at 2:17 PM, Phil Luckhurst < > > > > phil.luckhu...@encycle.com> wrote: > > > > > > > >> Thanks Jonathan, I didn't spot that JIRA. > > > >> > > > >> Phil > > > >> > > > >> -----Original Message----- > > > >> From: Jonathan Bond [mailto:jb...@netflix.com.INVALID] > > > >> Sent: 12 April 2016 14:08 > > > >> To: users@kafka.apache.org > > > >> Subject: Re: KafkaProducer 0.9.0.1 continually sends metadata > > > >> requests > > > >> > > > >> Phil, > > > >> In our case this bug placed significant load on our brokers. We > > raised a > > > >> bug https://issues.apache.org/jira/browse/KAFKA-3358 to get this > > > >> resolved. > > > >> > > > >> On Tue, Apr 12, 2016 at 5:39 AM Phil Luckhurst < > > > >> phil.luckhu...@encycle.com> > > > >> wrote: > > > >> > > > >> > With debug logging turned on we've sometimes seen our logs > > > >> > filling > > up > > > >> > with the kafka producer sending metadata requests every 100ms e.g. > > > >> > > > > >> > 2016-04-08 10:39:33,592 DEBUG [kafka-producer-network-thread | > > > >> > phil-pa-1] > > > >> > org.apache.kafka.clients.NetworkClient: Sending metadata > > > >> > request ClientRequest(expectResponse=true, callback=null, > > > >> > > > request=RequestSend(header={api_key=3,api_version=0,correlation_id=249 > > > >> > ,client_id=phil-pa-1}, > > > >> > body={topics=[phil-pa-1-device-update]}), > > > >> > isInitiatedByNetworkClient, createdTimeMs=1460108373592, > > sendTimeMs=0) > > > >> > to node 0 > > > >> > 2016-04-08 10:39:33,592 DEBUG [kafka-producer-network-thread | > > > >> > phil-pa-1] > > > >> > org.apache.kafka.clients.Metadata: Updated cluster metadata > > > >> > version > > > >> > 248 to Cluster(nodes = [Node(0, ta-eng-kafka2, 9092)], > > > >> > partitions = [Partition(topic = phil-pa-1-device-update, > > > >> > partition = 0, leader = > > 0, > > > >> > replicas = [0,], isr = [0,]]) > > > >> > 2016-04-08 10:39:33,698 DEBUG [kafka-producer-network-thread | > > > >> > phil-pa-1] > > > >> > org.apache.kafka.clients.NetworkClient: Sending metadata > > > >> > request ClientRequest(expectResponse=true, callback=null, > > > >> > > > request=RequestSend(header={api_key=3,api_version=0,correlation_id=250 > > > >> > ,client_id=phil-pa-1}, > > > >> > body={topics=[phil-pa-1-device-update]}), > > > >> > isInitiatedByNetworkClient, createdTimeMs=1460108373698, > > sendTimeMs=0) > > > >> > to node 0 > > > >> > 2016-04-08 10:39:33,698 DEBUG [kafka-producer-network-thread | > > > >> > phil-pa-1] > > > >> > org.apache.kafka.clients.Metadata: Updated cluster metadata > > > >> > version > > > >> > 249 to Cluster(nodes = [Node(0, ta-eng-kafka2, 9092)], > > > >> > partitions = [Partition(topic = phil-pa-1-device-update, > > > >> > partition = 0, leader = > > 0, > > > >> > replicas = [0,], isr = [0,]]) > > > >> > > > > >> > These metadata requests continue to be sent every 100ms > > > >> > (retry.backoff.ms) until we stop the process. > > > >> > > > > >> > This only seems to happen if the KafkaProducer instance is > > > >> > created > > but > > > >> > not used to publish a message for 5 minutes. After 5 minutes ( > > > >> > metadata.max.age.ms) the producer thread sends a metadata > > > >> > request > > to > > > >> > the server that has an empty topics list and the server > > > >> > responds > > with > > > >> > the partition information for *all* topics hosted on the server. > > > >> > > > > >> > 2016-04-11 14:16:39,320 DEBUG [kafka-producer-network-thread | > > > >> > phil-pa-1] > > > >> > org.apache.kafka.clients.NetworkClient: Sending metadata > > > >> > request ClientRequest(expectResponse=true, callback=null, > > > >> > > > request=RequestSend(header={api_key=3,api_version=0,correlation_id=0,c > > > >> > lient_id=phil-pa-1}, body={topics=[]}), > > > >> > isInitiatedByNetworkClient, createdTimeMs=1460380599289, > > > >> > sendTimeMs=0) to node -1 > > > >> > > > > >> > If we then use that KafkaProducer instance to send a message > > > >> > the > > next > > > >> > 'Sending meta request' will just be for the topic we have sent > > > >> > the message to and this then triggers the flood of retry > > > >> > requests as > > noted > > > >> above. > > > >> > > > > >> > If we ensure we send the first message within the time set by > > > >> > metadata.max.age.ms (default 5 minutes) then everything works > > > >> > as expected and the metadata requests do not continually get > retried. > > > >> > > > > >> > In many cases I can understand that creating a KafkaProducer > > > >> > and > > then > > > >> > not using it within 5 minutes is not usual but in our case > > > >> > we're creating it when our REST based application starts up and > > > >> > we can't guarantee that a message will be published within that > > > >> > time. To get around this we are currently posting a test > > > >> > message to the topic > > right > > > >> > after creating the KafkaProducer prevents it happening. > > > >> > > > > >> > I'm not sure if this is worthy of a fix but I thought I'd post > > > >> > it > > here > > > >> > in case someone else hits the same problem. > > > >> > > > > >> > Regards, > > > >> > Phil Luckhurst > > > >> > > > > >> > > > > > > > > > > > > > >