Re: KIP-4 Wiki Update

Grant Henke Tue, 15 Mar 2016 07:50:53 -0700

Moving the relevant wiki text here for discussion/tracking:
>
> Server-side Admin Request handlers
>
> At the highest level, admin requests will be handled on the brokers the
> same way that all message types are. However, because admin messages modify
> cluster metadata they should be handled by the controller. This allows the
> controller to propagate the changes to the rest of the cluster.  However,
> because the messages need to be handled by the controller does not
> necessarily mean they need to be sent directly to the controller. A message
> forwarding mechanism can be used to forward the message from any broker to
> the correct broker for handling.
>
> Because supporting all of this is quite the undertaking I will describe
> the "ideal functionality" and then the "intermediate functionality" that
> gets us some basic administrative support quickly while working towards the
> optimal state.
>
> *Ideal Functionality:*
>
>    1. A client sends an admin request to *any* broker
>    2. The admin request is forwarded to the required broker (likely the
>    controller)
>    3. The request is handled and the server blocks until a timeout is
>    reached or the requested operation is completed (failure or success)
>       1. An operation is considered complete/successful when *all
>       required nodes have the correct/current state*.
>       2. Immediate follow up requests to *any broker* will succeed.
>       3. Requests that timeout may still be completed after the timeout.
>       The users would need to poll to check the state.
>    4. The response is generated and forwarded back to the broker that
>    received the request.
>    5. A response is sent back to the client.
>
> *Intermediate Functionality*:
>
>    1. A client sends an admin request to *the controller* broker
>       1. As a follow up request forwarding can be added transparently.
>       (see below)
>    2. The request is handled and the server blocks until a timeout is
>    reached or the requested operation is completed (failure or success)
>       1. An operation is considered complete/successful when *the
>       controller node has the correct/current state.*
>       2. Immediate follow up requests to *the controller* will succeed.
>       Others (not to the controller) are likely to succeed or cause a 
> retriable
>       exception that would eventually succeed.
>       3. Requests that timeout may still be completed after the timeout.
>       The users would need to poll to check the state.
>    3. A response is sent back to the client.
>
> The ideal functionality has 2 features that are more challenging
> initially. For that reason those features will be removed from the initial
> changes, but will be tracked as follow up improvements. However, this
> intermediate solution should allow for a relatively transparent  transition
> to the ideal functionality.
>
> *Request Forwarding: KAFKA-1912
> <https://issues.apache.org/jira/browse/KAFKA-1912>*
>
> Request forwarding is relevant to any message the needs to be sent to the
> "correct" broker (ex: partition leader, group coordinator, etc). Though at
> first it may seam simple it has many technicall challenges that need to be
> decided in regards to connections, failure, retries, etc. Today, we depend
> on the client to choose the correct broker and clients that want to utilize
> the cluster "optimally" would likely continue to do so. For those reasons
> it can be handled it can be handled generically as an independent feature.
>
> *Cluster Consistent Blocking:*
>
> Blocking an admin request until the entire cluster is aware of the
> correct/current state is difficult based on Kafka's current approach for
> propagating metadata. This approach varies based on the the metadata
> changing.
>
>    - Topic metadata changes are propagated via UpdateMetadata and
>    LeaderAndIsr requests
>    - Config changes are propagated via zookeeper and listeners
>    - ACL changes depend on the implementation of the Authorizer interface
>       - The default SimpleACLAuthorizer uses zookeeper and listeners
>
> Though all of these mechanisms are different, they are all commonly
> "eventually consistent". None of the mechanisms, as currently implemented,
> will block until the metadata has been propagated successfully. Changing
> this behavior would require a large amount of change to the
> KafkaController, additional inter-broker messages, and potentially a change
> to the Authorizer interface. These are are all changes that should not
> block the implementation of KIP-4.
>
> The intermediate changes in KIP-4 should allow an easy transition to
> "complete blocking" when the work can be done. This is supported by
> providing *optional* local blocking in the mean time. This local blocking
> only blocks until the local state on the controller is correct. We will
> still provide a polling mechanism for users that do not want to block at
> all. A polling mechanism is required in the optimal implementation too
> because users still need a way to check state after a timeout occurs
> because operations like "create topic" are not transactional. Local
> blocking has the added benefit of avoiding wasted poll requests to other
> brokers when its impossible for the request to be completed. If the
> controllers state is not correct, then the other brokers cant be either.
> Clients who don't want to validate the entire cluster state is correct can
> block on the controller and avoid polling all together with reasonable
> confidence that though they may get a retriable error on follow up
> requests, the requested change was successful and the cluster will be
> accurate eventually.
>
> Because we already add a timeout field to the requests wire protocols,
> changing the behavior to block until the cluster is consistent in the
> future would not require a protocol change. Though the version could be
> bumped to indicate a behavior change.
>


Thanks,
Grant


On Mon, Mar 14, 2016 at 5:07 PM, Grant Henke <[email protected]> wrote:

> I have been updating the KIP-4 wiki page based on the last KIP call and
> wanted to get some review and discussion around the server side
> implementation for admin requests. Both the "ideal" functionality and the
> "intermediated" functionality. The updates are still in progress, but this
> section is the most critical and will likely have the most discussion. This
> topic has had a few shifts in perspective and various discussions on
> synchronous vs asynchronous server support. The wiki contains my current
> perspective on the challenges and approach.
>
> If you have any thoughts or feedback on the "Server-side Admin Request
> handlers" section here
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-2.Server-sideAdminRequesthandlers>.
> Lets discuss them in this thread.
>
> For reference the last KIP discussion can be viewed here:
> https://youtu.be/rFW0-zJqg5I?t=12m30s
>
> Thank you,
> Grant
> --
> Grant Henke
> Software Engineer | Cloudera
> [email protected] | twitter.com/gchenke | linkedin.com/in/granthenke
>



-- 
Grant Henke
Software Engineer | Cloudera
[email protected] | twitter.com/gchenke | linkedin.com/in/granthenke

Re: KIP-4 Wiki Update

Reply via email to