Re: KIP-4 Wiki Update

Dana Powers Wed, 30 Mar 2016 08:50:21 -0700

Grant - sorry I was unable to attend. Getting API access to admin
functionality has been a big ask for python client users. I like this KIP a
lot.


I reviewed the details quickly. Here are some comments:

MetadataRequest v1: long-term / conceptually, I think a "null" topic list
aligns better with fetching all topics. Empty list aligns better with
fetching no topics. I recognize this means that empty list behaves
differently in v0 versus v1. But hey, what are protocol versions good for
if not changing behavior... :) API design comment. take it or leave it.

Error Codes: I think it would be useful to describe for each new Response
type, which of the new error codes apply under what circumstances. For
example, in CreateTopic, there is a note that "Only one from (Partitions +
ReplicationFactor), ReplicaAssignment can be defined in one instruction."
Will violating this rule generate an error code? If so, which one?

Ignoring Duplicates: "Multiple instructions for the same topic in one
request will be silently ignored, only the last from the list will be
executed." This could get confusing for clients. What are your thoughts on
treating duplicates as an error and not executing any of them w/ error code
returned? This would put the de-duplication logic burden on the client and
also make it explicitly clear what instructions were actually executed.

Request timeouts: "Because we already add a timeout field to the requests
wire protocols..." Where is this timeout specified? Is this a separate KIP
or did I miss it in KIP-4?

-Dana

On Wed, Mar 30, 2016 at 8:03 AM, Grant Henke <[email protected]> wrote:

> I didn't get anyone in attendance for this meeting. If you would like to
> discuss it please let me know.
>
> Thank you,
> Grant
>
> On Mon, Mar 28, 2016 at 9:18 AM, Grant Henke <[email protected]> wrote:
>
> > I am hoping to get more discussion and feedback around the blocking vs
> > async discussion so I can start to get KIP-4 patches reviewed.
> >
> > In order to facilitate a faster discussion I will hold an open discussion
> > on Tuesday March 29th at 12pm PST (right after the usual KIP call, if we
> > have one). Please join via the hangouts link below:
> >
> >    - https://plus.google.com/hangouts/_/cloudera.com/discuss-kip-4
> >
> > If you can't make that time, please suggest another time you would like
> to
> > meet and I can hold another meeting too. I will take notes of the
> meetings
> > and update here.
> >
> > Thank you,
> > Grant
> >
> > On Tue, Mar 15, 2016 at 9:49 AM, Grant Henke <[email protected]>
> wrote:
> >
> >> Moving the relevant wiki text here for discussion/tracking:
> >>>
> >>> Server-side Admin Request handlers
> >>>
> >>> At the highest level, admin requests will be handled on the brokers the
> >>> same way that all message types are. However, because admin messages
> modify
> >>> cluster metadata they should be handled by the controller. This allows
> the
> >>> controller to propagate the changes to the rest of the cluster.
> However,
> >>> because the messages need to be handled by the controller does not
> >>> necessarily mean they need to be sent directly to the controller. A
> message
> >>> forwarding mechanism can be used to forward the message from any
> broker to
> >>> the correct broker for handling.
> >>>
> >>> Because supporting all of this is quite the undertaking I will describe
> >>> the "ideal functionality" and then the "intermediate functionality"
> that
> >>> gets us some basic administrative support quickly while working
> towards the
> >>> optimal state.
> >>>
> >>> *Ideal Functionality:*
> >>>
> >>>    1. A client sends an admin request to *any* broker
> >>>    2. The admin request is forwarded to the required broker (likely the
> >>>    controller)
> >>>    3. The request is handled and the server blocks until a timeout is
> >>>    reached or the requested operation is completed (failure or success)
> >>>       1. An operation is considered complete/successful when *all
> >>>       required nodes have the correct/current state*.
> >>>       2. Immediate follow up requests to *any broker* will succeed.
> >>>       3. Requests that timeout may still be completed after the
> >>>       timeout. The users would need to poll to check the state.
> >>>    4. The response is generated and forwarded back to the broker that
> >>>    received the request.
> >>>    5. A response is sent back to the client.
> >>>
> >>> *Intermediate Functionality*:
> >>>
> >>>    1. A client sends an admin request to *the controller* broker
> >>>       1. As a follow up request forwarding can be added transparently.
> >>>       (see below)
> >>>    2. The request is handled and the server blocks until a timeout is
> >>>    reached or the requested operation is completed (failure or success)
> >>>       1. An operation is considered complete/successful when *the
> >>>       controller node has the correct/current state.*
> >>>       2. Immediate follow up requests to *the controller* will succeed.
> >>>       Others (not to the controller) are likely to succeed or cause a
> retriable
> >>>       exception that would eventually succeed.
> >>>       3. Requests that timeout may still be completed after the
> >>>       timeout. The users would need to poll to check the state.
> >>>    3. A response is sent back to the client.
> >>>
> >>> The ideal functionality has 2 features that are more challenging
> >>> initially. For that reason those features will be removed from the
> initial
> >>> changes, but will be tracked as follow up improvements. However, this
> >>> intermediate solution should allow for a relatively transparent
> transition
> >>> to the ideal functionality.
> >>>
> >>> *Request Forwarding: KAFKA-1912
> >>> <https://issues.apache.org/jira/browse/KAFKA-1912>*
> >>>
> >>> Request forwarding is relevant to any message the needs to be sent to
> >>> the "correct" broker (ex: partition leader, group coordinator, etc).
> Though
> >>> at first it may seam simple it has many technicall challenges that
> need to
> >>> be decided in regards to connections, failure, retries, etc. Today, we
> >>> depend on the client to choose the correct broker and clients that
> want to
> >>> utilize the cluster "optimally" would likely continue to do so. For
> >>> those reasons it can be handled it can be handled generically as an
> >>> independent feature.
> >>>
> >>> *Cluster Consistent Blocking:*
> >>>
> >>> Blocking an admin request until the entire cluster is aware of the
> >>> correct/current state is difficult based on Kafka's current approach
> for
> >>> propagating metadata. This approach varies based on the the metadata
> >>> changing.
> >>>
> >>>    - Topic metadata changes are propagated via UpdateMetadata and
> >>>    LeaderAndIsr requests
> >>>    - Config changes are propagated via zookeeper and listeners
> >>>    - ACL changes depend on the implementation of the Authorizer
> >>>    interface
> >>>       - The default SimpleACLAuthorizer uses zookeeper and listeners
> >>>
> >>> Though all of these mechanisms are different, they are all commonly
> >>> "eventually consistent". None of the mechanisms, as currently
> implemented,
> >>> will block until the metadata has been propagated successfully.
> Changing
> >>> this behavior would require a large amount of change to the
> >>> KafkaController, additional inter-broker messages, and potentially a
> change
> >>> to the Authorizer interface. These are are all changes that should not
> >>> block the implementation of KIP-4.
> >>>
> >>> The intermediate changes in KIP-4 should allow an easy transition to
> >>> "complete blocking" when the work can be done. This is supported by
> >>> providing *optional* local blocking in the mean time. This local
> >>> blocking only blocks until the local state on the controller is
> correct. We
> >>> will still provide a polling mechanism for users that do not want to
> block
> >>> at all. A polling mechanism is required in the optimal implementation
> too
> >>> because users still need a way to check state after a timeout occurs
> >>> because operations like "create topic" are not transactional. Local
> >>> blocking has the added benefit of avoiding wasted poll requests to
> other
> >>> brokers when its impossible for the request to be completed. If the
> >>> controllers state is not correct, then the other brokers cant be
> either.
> >>> Clients who don't want to validate the entire cluster state is correct
> can
> >>> block on the controller and avoid polling all together with reasonable
> >>> confidence that though they may get a retriable error on follow up
> >>> requests, the requested change was successful and the cluster will be
> >>> accurate eventually.
> >>>
> >>> Because we already add a timeout field to the requests wire protocols,
> >>> changing the behavior to block until the cluster is consistent in the
> >>> future would not require a protocol change. Though the version could be
> >>> bumped to indicate a behavior change.
> >>>
> >>
> >> Thanks,
> >> Grant
> >>
> >>
> >> On Mon, Mar 14, 2016 at 5:07 PM, Grant Henke <[email protected]>
> wrote:
> >>
> >>> I have been updating the KIP-4 wiki page based on the last KIP call and
> >>> wanted to get some review and discussion around the server side
> >>> implementation for admin requests. Both the "ideal" functionality and
> the
> >>> "intermediated" functionality. The updates are still in progress, but
> this
> >>> section is the most critical and will likely have the most discussion.
> This
> >>> topic has had a few shifts in perspective and various discussions on
> >>> synchronous vs asynchronous server support. The wiki contains my
> current
> >>> perspective on the challenges and approach.
> >>>
> >>> If you have any thoughts or feedback on the "Server-side Admin Request
> >>> handlers" section here
> >>> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-2.Server-sideAdminRequesthandlers
> >.
> >>> Lets discuss them in this thread.
> >>>
> >>> For reference the last KIP discussion can be viewed here:
> >>> https://youtu.be/rFW0-zJqg5I?t=12m30s
> >>>
> >>> Thank you,
> >>> Grant
> >>> --
> >>> Grant Henke
> >>> Software Engineer | Cloudera
> >>> [email protected] | twitter.com/gchenke | linkedin.com/in/granthenke
> >>>
> >>
> >>
> >>
> >> --
> >> Grant Henke
> >> Software Engineer | Cloudera
> >> [email protected] | twitter.com/gchenke | linkedin.com/in/granthenke
> >>
> >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > [email protected] | twitter.com/gchenke | linkedin.com/in/granthenke
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> [email protected] | twitter.com/gchenke | linkedin.com/in/granthenke
>

Re: KIP-4 Wiki Update

Reply via email to