[ 
https://issues.apache.org/jira/browse/KAFKA-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-901:
--------------------------------

    Attachment: metadata-request-improvement.patch

This patch changes the way a Kafka server handles metadata requests. Metadata 
requests are currently a bottleneck in the system since the server reads 
several paths from zookeeper to serve one metadata request. The latency is also 
proportional to the number of topics in a metadata request. A faster way to 
serve metadata requests is through the controller. The reason is the controller 
makes all state change decisions for the cluster, so it has a cache with the 
latest leadership information. The patch uses the following algorithm to serve 
metadata requests -

1. If broker is controller, read leadership information from cache and send the 
response
2. If broker is not controller, instantiate a sync producer to forward metadata 
request to the controller
3. If broker is not controller and a controller is unavailable, send 
ControllerNotAvailable error code back to the client

There are some things to note here -

1. How does a broker know who the current controller and its host/port ? If we 
read from zookeeper, then that makes metadata requests slow, although less 
slower than before. However, upon controller failover, the controller sends a 
LeaderAndIsr request to all* brokers. So when a broker receives any state 
change request from the controller, it stores the controller's host/port in 
cache from zookeeper. Since state change requests are rare, this is ok

Now there is a corner case we need to take care of. If a new broker is brought 
up and it doesn't have any partitions assigned to it, it won't receive any 
LeaderAndIsr request in the current code base. This patch takes care of this by 
changing controller to always send leader and isr request to newly restarted 
brokers, even if there are no partitions in the request.

2. What timeout should be used when a broker wants to forward a metadata 
request to the controller ? 
Since there isn't a way to know the timeout specified by the original metadata 
request, we can potentially set the socket timeout for the forwarded request to 
be Integer.MAX. This will ensure that the forwarded request will not 
prematurely time out. However, we need to ensure that the controller always 
sends a response back to the broker OR closes the socket. This will prevent the 
broker's I/O thread from waiting indefinitely for a response from the 
controller. This also requires the controller to not block when serving 
metadata from its cache OR it will block some other broker's I/O thread, which 
is bad.

3. On the controller, should it acquire a lock when reading its cache to serve 
a metadata request ?

This is not required and is potentially dangerous. It is not required since 
even if the controller's cache information is undergoing change and we send 
stale information to the client, it will get an error and retry. Eventually, 
the cache will be consistent and the accurate metadata will be sent to the 
client. This is also ok since changes to the controller's cache are relatively 
rare so chances of stale metadata are low.

4. Do we need to make the timeout for the forwarded metadata request 
configurable ?

Ideally no, except for unit tests. The reason is unit tests involve shutting 
down kafka brokers. However, when a broker is shut down in a unit test, it does 
not close the socket connections associated with that broker. The impact of 
that is if a controller broker is shut down, it will not respond to few 
forwarded metadata requests and it will not close the socket. That leads to 
some other broker indefinitely wait on receiving a response from the 
controller. This doesn't happen in non unit test environments, since if the 
broker is shutdown or fails, the processes releases all socket connections 
associated with it. So the other broker gets a broken pipe exception and 
doesn't end up waiting forever
                
> Kafka server can become unavailable if clients send several metadata requests
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-901
>                 URL: https://issues.apache.org/jira/browse/KAFKA-901
>             Project: Kafka
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>         Attachments: metadata-request-improvement.patch
>
>
> Currently, if a broker is bounced without controlled shutdown and there are 
> several clients talking to the Kafka cluster, each of the clients realize the 
> unavailability of leaders for some partitions. This leads to several metadata 
> requests sent to the Kafka brokers. Since metadata requests are pretty slow, 
> all the I/O threads quickly become busy serving the metadata requests. This 
> leads to a full request queue, that stalls handling of finished responses 
> since the same network thread handles requests as well as responses. In this 
> situation, clients timeout on metadata requests and send more metadata 
> requests. This quickly makes the Kafka cluster unavailable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to