Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

Guozhang Wang Mon, 09 Feb 2015 09:05:05 -0800

I feel the benefits of lowering the development bar for new clients does
not worth the complexity we need to introduce in the server side, as today
the clients just need one more request type (metadata request) to send the
produce / fetch to the right brokers, whereas re-routing mechanism will
result in complicated between-brokers communication patterns that
potentially impact Kafka performance and making debugging / trouble
shooting much harder.


An alternative way to ease the development of the clients is to use a proxy
in front of the kafka servers, like the rest proxy we have built before,
which we use for non-java clients primarily but also can be treated as
handling cluster metadata discovery for clients. Comparing to the
re-routing idea, the proxy also introduces two-hops but its layered
architecture is simpler.

Guozhang


On Sun, Feb 8, 2015 at 8:00 AM, Jay Kreps <[email protected]> wrote:

> Hey Jiangjie,
>
> Re routing support doesn't force clients to use it. Java and all existing
> clients would work as now where request are intelligently routed by the
> client, but this would lower the bar for new clients. That said I agree the
> case for reroute get admin commands is much stronger than data.
>
> The idea of separating admin/metadata from would definitely solve some
> problems but it would also add a lot of complexity--new ports, thread
> pools, etc. this is an interesting idea to think over but I'm not sure if
> it's worth it. Probably a separate effort in any case.
>
> -jay
>
> On Friday, February 6, 2015, Jiangjie Qin <[email protected]>
> wrote:
>
> > I¹m a little bit concerned about the request routers among brokers.
> > Typically we have a dominant percentage of produce and fetch
> > request/response. Routing them from one broker to another seems not
> wanted.
> > Also I think we generally have two types of requests/responses: data
> > related and admin related. It is typically a good practice to separate
> > data plain from control plain. That suggests we should have another admin
> > port to serve those admin requests and probably have different
> > authentication/authorization from the data port.
> >
> > Jiangjie (Becket) Qin
> >
> > On 2/6/15, 11:18 AM, "Joe Stein" <[email protected]> wrote:
> >
> > >I updated the installation and sample usage for the existing patches on
> > >the
> > >KIP site
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and
> > >+centralized+administrative+operations
> > >
> > >There are still a few pending items here.
> > >
> > >1) There was already some discussion about using the Broker that is the
> > >Controller here https://issues.apache.org/jira/browse/KAFKA-1772 and we
> > >should elaborate on that more in the thread or agree we are ok with
> admin
> > >asking for the controller to talk to and then just sending that broker
> the
> > >admin tasks.
> > >
> > >2) I like this idea https://issues.apache.org/jira/browse/KAFKA-1912
> but
> > >we
> > >can refactor after KAFK-1694 committed, no? I know folks just want to
> talk
> > >to the broker that is the controller. It may even become useful to have
> > >the
> > >controller run on a broker that isn't even a topic broker anymore (small
> > >can of worms I am opening here but it elaborates on Guozhang's hot spot
> > >point.
> > >
> > >3) anymore feedback?
> > >
> > >- Joe Stein
> > >
> > >On Fri, Jan 23, 2015 at 3:15 PM, Guozhang Wang <[email protected]>
> > wrote:
> > >
> > >> A centralized admin operation protocol would be very useful.
> > >>
> > >> One more general comment here is that controller is originally
> designed
> > >>to
> > >> only talk to other brokers through ControllerChannel, while the broker
> > >> instance which carries the current controller is agnostic of its
> > >>existence,
> > >> and use KafkaApis to handle general Kafka requests. Having all admin
> > >> requests redirected to the controller instance will force the broker
> to
> > >>be
> > >> aware of its carried controller, and access its internal data for
> > >>handling
> > >> these requests. Plus with the number of clients out of Kafka's
> control,
> > >> this may easily cause the controller to be a hot spot in terms of
> > >>request
> > >> load.
> > >>
> > >>
> > >> On Thu, Jan 22, 2015 at 10:09 PM, Joe Stein <[email protected]>
> > >>wrote:
> > >>
> > >> > inline
> > >> >
> > >> > On Thu, Jan 22, 2015 at 11:59 PM, Jay Kreps <[email protected]>
> > >>wrote:
> > >> >
> > >> > > Hey Joe,
> > >> > >
> > >> > > This is great. A few comments on KIP-4
> > >> > >
> > >> > > 1. This is much needed functionality, but there are a lot of the
> so
> > >> let's
> > >> > > really think these protocols through. We really want to end up
> with
> > >>a
> > >> set
> > >> > > of well thought-out, orthoganol apis. For this reason I think it
> is
> > >> > really
> > >> > > important to think through the end state even if that includes
> APIs
> > >>we
> > >> > > won't implement in the first phase.
> > >> > >
> > >> >
> > >> > ok
> > >> >
> > >> >
> > >> > >
> > >> > > 2. Let's please please please wait until we have switched the
> server
> > >> over
> > >> > > to the new java protocol definitions. If we add upteen more ad hoc
> > >> scala
> > >> > > objects that is just generating more work for the conversion we
> > >>know we
> > >> > > have to do.
> > >> > >
> > >> >
> > >> > ok :)
> > >> >
> > >> >
> > >> > >
> > >> > > 3. This proposal introduces a new type of optional parameter. This
> > >>is
> > >> > > inconsistent with everything else in the protocol where we use -1
> or
> > >> some
> > >> > > other marker value. You could argue either way but let's stick
> with
> > >> that
> > >> > > for consistency. For clients that implemented the protocol in a
> > >>better
> > >> > way
> > >> > > than our scala code these basic primitives are hard to change.
> > >> > >
> > >> >
> > >> > yes, less confusing, ok.
> > >> >
> > >> >
> > >> > >
> > >> > > 4. ClusterMetadata: This seems to duplicate TopicMetadataRequest
> > >>which
> > >> > has
> > >> > > brokers, topics, and partitions. I think we should rename that
> > >>request
> > >> > > ClusterMetadataRequest (or just MetadataRequest) and include the
> id
> > >>of
> > >> > the
> > >> > > controller. Or are there other things we could add here?
> > >> > >
> > >> >
> > >> > We could add broker version to it.
> > >> >
> > >> >
> > >> > >
> > >> > > 5. We have a tendency to try to make a lot of requests that can
> > >>only go
> > >> > to
> > >> > > particular nodes. This adds a lot of burden for client
> > >>implementations
> > >> > (it
> > >> > > sounds easy but each discovery can fail in many parts so it ends
> up
> > >> > being a
> > >> > > full state machine to do right). I think we should consider making
> > >> admin
> > >> > > commands and ideally as many of the other apis as possible
> > >>available on
> > >> > all
> > >> > > brokers and just redirect to the controller on the broker side.
> > >>Perhaps
> > >> > > there would be a general way to encapsulate this re-routing
> > >>behavior.
> > >> > >
> > >> >
> > >> > If we do that then we should also preserve what we have and do both.
> > >>The
> > >> > client can then decide "do I want to go to any broker and proxy" or
> > >>just
> > >> > "go to controller and run admin task". Lots of folks have seen
> > >> controllers
> > >> > come under distress because of their producers/consumers. There is
> > >>ticket
> > >> > too for controller elect and re-elect
> > >> > https://issues.apache.org/jira/browse/KAFKA-1778 so you can force
> it
> > >>to
> > >> a
> > >> > broker that has 0 load.
> > >> >
> > >> >
> > >> > >
> > >> > > 6. We should probably normalize the key value pairs used for
> configs
> > >> > rather
> > >> > > than embedding a new formatting. So two strings rather than one
> > >>with an
> > >> > > internal equals sign.
> > >> > >
> > >> >
> > >> > ok
> > >> >
> > >> >
> > >> > >
> > >> > > 7. Is the postcondition of these APIs that the command has begun
> or
> > >> that
> > >> > > the command has been completed? It is a lot more usable if the
> > >>command
> > >> > has
> > >> > > been completed so you know that if you create a topic and then
> > >>publish
> > >> to
> > >> > > it you won't get an exception about there being no such topic.
> > >> > >
> > >> >
> > >> > We should define that more. There needs to be some more state there,
> > >>yes.
> > >> >
> > >> > We should try to cover
> > >>https://issues.apache.org/jira/browse/KAFKA-1125
> > >> > within what we come up with.
> > >> >
> > >> >
> > >> > >
> > >> > > 8. Describe topic and list topics duplicate a lot of stuff in the
> > >> > metadata
> > >> > > request. Is there a reason to give back topics marked for
> deletion?
> > >>I
> > >> > feel
> > >> > > like if we just make the post-condition of the delete command be
> > >>that
> > >> the
> > >> > > topic is deleted that will get rid of the need for this right? And
> > >>it
> > >> > will
> > >> > > be much more intuitive.
> > >> > >
> > >> >
> > >> > I will go back and look through it.
> > >> >
> > >> >
> > >> > >
> > >> > > 9. Should we consider batching these requests? We have generally
> > >>tried
> > >> to
> > >> > > allow multiple operations to be batched. My suspicion is that
> > >>without
> > >> > this
> > >> > > we will get a lot of code that does something like
> > >> > >    for(topic: adminClient.listTopics())
> > >> > >       adminClient.describeTopic(topic)
> > >> > > this code will work great when you test on 5 topics but not do as
> > >>well
> > >> if
> > >> > > you have 50k.
> > >> > >
> > >> >
> > >> > So => Input is a list of topics (or none for all) and a batch
> response
> > >> from
> > >> > the controller (which could be routed through another broker) of the
> > >> entire
> > >> > response? We could introduce a Batch keyword to explicitly show the
> > >>usage
> > >> > of it.
> > >> >
> > >> >
> > >> > > 10. I think we should also discuss how we want to expose a
> > >>programmatic
> > >> > JVM
> > >> > > client api for these operations. Currently people rely on
> AdminUtils
> > >> > which
> > >> > > is totally sketchy. I think we probably need another client under
> > >> > clients/
> > >> > > that exposes administrative functionality. We will need this just
> to
> > >> > > properly test the new apis, I suspect. We should figure out that
> > >>API.
> > >> > >
> > >> >
> > >> > We were talking about that here
> > >> > https://issues.apache.org/jira/browse/KAFKA-1774 and wrote it in
> java
> > >> > https://reviews.apache.org/r/29301/diff/7/?page=4#75 so we could do
> > >> > something like that, sure.
> > >> >
> > >> >
> > >> > >
> > >> > > 11. The other information that would be really useful to get would
> > >>be
> > >> > > information about partitions--how much data is in the partition,
> > >>what
> > >> are
> > >> > > the segment offsets, what is the log-end offset (i.e. last
> offset),
> > >> what
> > >> > is
> > >> > > the compaction point, etc. I think that done right this would be
> the
> > >> > > successor to the very awkward OffsetRequest we have today.
> > >> > >
> > >> >
> > >> > yes!
> > >> >
> > >> >
> > >> > >
> > >> > > -Jay
> > >> > >
> > >> > > On Wed, Jan 21, 2015 at 10:27 PM, Joe Stein <[email protected]
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hi, created a KIP
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+an
> > >>d+centralized+administrative+operations
> > >> > > >
> > >> > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1694
> > >> > > >
> > >> > > > /*******************************************
> > >> > > >  Joe Stein
> > >> > > >  Founder, Principal Consultant
> > >> > > >  Big Data Open Source Security LLC
> > >> > > >  http://www.stealth.ly
> > >> > > >  Twitter: @allthingshadoop
> > >><http://www.twitter.com/allthingshadoop>
> > >> > > > ********************************************/
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> >
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

Reply via email to