Ismael, In our case KAFKA-3358 caused frequent metadata requests because we ended up swamping the broker. Getting all metadata is a very expensive request, once all requests threads spent their majority of time handling metadata requests, they'd start timing out. Timing out would cause a new request to be made, which would increase the needed metadata level - so when a response finally did arrive it was never recent enough. To resolve we had to shut down the 7000 clients and let the brokers catch-up, for other messages to start flowing again. I can't remember if clients were sending at retry.backoff.ms or timeout rate (it wasn't a fun day, and I don't want to recreate to find out). We now make sure we don't start up producers until a message is ready to be sent.
On Tue, Apr 12, 2016 at 6:59 AM Ismael Juma <ism...@juma.me.uk> wrote: > Sorry, I should say that KAFKA-3306 will fix the issue where we request > data for all topics instead of no topics. However, it seems like there is > an additional issue where it seems like we are performing metadata requests > too frequently. Phil, can you please add this information to KAFKA-3358 so > that we can figure out if more needs to be done? > > Ismael > > On Tue, Apr 12, 2016 at 2:52 PM, Ismael Juma <ism...@juma.me.uk> wrote: > > > Note that this should be fixed as part of > > https://issues.apache.org/jira/browse/KAFKA-3306 > > > > Ismael > > > > On Tue, Apr 12, 2016 at 2:17 PM, Phil Luckhurst < > > phil.luckhu...@encycle.com> wrote: > > > >> Thanks Jonathan, I didn't spot that JIRA. > >> > >> Phil > >> > >> -----Original Message----- > >> From: Jonathan Bond [mailto:jb...@netflix.com.INVALID] > >> Sent: 12 April 2016 14:08 > >> To: users@kafka.apache.org > >> Subject: Re: KafkaProducer 0.9.0.1 continually sends metadata requests > >> > >> Phil, > >> In our case this bug placed significant load on our brokers. We raised a > >> bug https://issues.apache.org/jira/browse/KAFKA-3358 to get this > >> resolved. > >> > >> On Tue, Apr 12, 2016 at 5:39 AM Phil Luckhurst < > >> phil.luckhu...@encycle.com> > >> wrote: > >> > >> > With debug logging turned on we've sometimes seen our logs filling up > >> > with the kafka producer sending metadata requests every 100ms e.g. > >> > > >> > 2016-04-08 10:39:33,592 DEBUG [kafka-producer-network-thread | > >> > phil-pa-1] > >> > org.apache.kafka.clients.NetworkClient: Sending metadata request > >> > ClientRequest(expectResponse=true, callback=null, > >> > request=RequestSend(header={api_key=3,api_version=0,correlation_id=249 > >> > ,client_id=phil-pa-1}, body={topics=[phil-pa-1-device-update]}), > >> > isInitiatedByNetworkClient, createdTimeMs=1460108373592, sendTimeMs=0) > >> > to node 0 > >> > 2016-04-08 10:39:33,592 DEBUG [kafka-producer-network-thread | > >> > phil-pa-1] > >> > org.apache.kafka.clients.Metadata: Updated cluster metadata version > >> > 248 to Cluster(nodes = [Node(0, ta-eng-kafka2, 9092)], partitions = > >> > [Partition(topic = phil-pa-1-device-update, partition = 0, leader = 0, > >> > replicas = [0,], isr = [0,]]) > >> > 2016-04-08 10:39:33,698 DEBUG [kafka-producer-network-thread | > >> > phil-pa-1] > >> > org.apache.kafka.clients.NetworkClient: Sending metadata request > >> > ClientRequest(expectResponse=true, callback=null, > >> > request=RequestSend(header={api_key=3,api_version=0,correlation_id=250 > >> > ,client_id=phil-pa-1}, body={topics=[phil-pa-1-device-update]}), > >> > isInitiatedByNetworkClient, createdTimeMs=1460108373698, sendTimeMs=0) > >> > to node 0 > >> > 2016-04-08 10:39:33,698 DEBUG [kafka-producer-network-thread | > >> > phil-pa-1] > >> > org.apache.kafka.clients.Metadata: Updated cluster metadata version > >> > 249 to Cluster(nodes = [Node(0, ta-eng-kafka2, 9092)], partitions = > >> > [Partition(topic = phil-pa-1-device-update, partition = 0, leader = 0, > >> > replicas = [0,], isr = [0,]]) > >> > > >> > These metadata requests continue to be sent every 100ms > >> > (retry.backoff.ms) until we stop the process. > >> > > >> > This only seems to happen if the KafkaProducer instance is created but > >> > not used to publish a message for 5 minutes. After 5 minutes ( > >> > metadata.max.age.ms) the producer thread sends a metadata request to > >> > the server that has an empty topics list and the server responds with > >> > the partition information for *all* topics hosted on the server. > >> > > >> > 2016-04-11 14:16:39,320 DEBUG [kafka-producer-network-thread | > >> > phil-pa-1] > >> > org.apache.kafka.clients.NetworkClient: Sending metadata request > >> > ClientRequest(expectResponse=true, callback=null, > >> > request=RequestSend(header={api_key=3,api_version=0,correlation_id=0,c > >> > lient_id=phil-pa-1}, body={topics=[]}), isInitiatedByNetworkClient, > >> > createdTimeMs=1460380599289, > >> > sendTimeMs=0) to node -1 > >> > > >> > If we then use that KafkaProducer instance to send a message the next > >> > 'Sending meta request' will just be for the topic we have sent the > >> > message to and this then triggers the flood of retry requests as noted > >> above. > >> > > >> > If we ensure we send the first message within the time set by > >> > metadata.max.age.ms (default 5 minutes) then everything works as > >> > expected and the metadata requests do not continually get retried. > >> > > >> > In many cases I can understand that creating a KafkaProducer and then > >> > not using it within 5 minutes is not usual but in our case we're > >> > creating it when our REST based application starts up and we can't > >> > guarantee that a message will be published within that time. To get > >> > around this we are currently posting a test message to the topic right > >> > after creating the KafkaProducer prevents it happening. > >> > > >> > I'm not sure if this is worthy of a fix but I thought I'd post it here > >> > in case someone else hits the same problem. > >> > > >> > Regards, > >> > Phil Luckhurst > >> > > >> > > > > >