Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Colin McCabe Wed, 01 Mar 2017 13:58:49 -0800

Hi all,

Thanks for commenting, everyone.  Does anyone have more questions or
comments, or should we vote?  The latest proposal is up at
https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations


best,
Colin


On Thu, Feb 16, 2017, at 15:00, Colin McCabe wrote:
> On Thu, Feb 16, 2017, at 14:11, Dong Lin wrote:
> > Hey Colin,
> > 
> > Thanks for the update. I have two comments:
> > 
> > - I actually think it is simpler and good enough to have per-topic API
> > instead of batch-of-topic API. This is different from the argument for
> > batch-of-partition API because, unlike operation on topic, people usually
> > operate on multiple partitions (e.g. seek(), purge()) at a time. Is there
> > performance concern with per-topic API? I am wondering if we should do
> > per-topic API until there is use-case or performance benefits of
> > batch-of-topic API.
> 
> Yes, there is a performance concern with only supporting operations on
> one topic at a time.  Jay expressed this in some of his earlier emails
> and some other people did as well.  We have cases in mind for management
> software where many topics are created at once.
> 
> > 
> > - Currently we have interface "Consumer" and "Producer". And we also have
> > implementations of these two interfaces as "KafkaConsumer" and
> > "KafkaProducer". If we follow the same naming pattern, should we have
> > interface "AdminClient" and the implementation "KafkaAdminClient",
> > instead
> > of the other way around?
> 
> That's a good point.  We should do that for consistency.
> 
> best,
> Colin
> 
> > 
> > Dong
> > 
> > 
> > On Thu, Feb 16, 2017 at 10:19 AM, Colin McCabe <[email protected]>
> > wrote:
> > 
> > > Hi all,
> > >
> > > So I think people have made some very good points so far.  There seems
> > > to be agreement that we need to have explicit batch APIs for the sake of
> > > efficiency, so I added that back.
> > >
> > > Contexts seem a little more complex than we thought, so I removed that
> > > from the proposal.
> > >
> > > I removed the Impl class.  Instead, we now have a KafkaAdminClient
> > > interface and an AdminClient implementation.  I think this matches our
> > > other code better, as Jay commented.
> > >
> > > Each call now has an "Options" object that is passed in.  This will
> > > allow us to easily add new parameters to the calls without having tons
> > > of function overloads.  Similarly, each call now has a Results object,
> > > which will let us easily extend the results we are returning if needed.
> > >
> > > Many people made the point that Java 7 Futures are not that useful, but
> > > Java 8 CompletableFutures are.  With CompletableFutures, you can chain
> > > calls, adapt them, join them-- basically all the stuff people are doing
> > > in Node.js and Twisted Python.  Java 7 Futures don't really let you do
> > > anything but poll for a value or block.  So I felt that it was better to
> > > just go with a CompletableFuture-based API.
> > >
> > > People also made the point that they would like an easy API for waiting
> > > on complete success of a batch call.  For example, an API that would
> > > fail if even one topic wasn't created in createTopics.  So I came up
> > > with Result objects that provide multiple futures that you can wait on.
> > > You can wait on a future that fires when everything is complete, or you
> > > can wait on futures for individual topics being created.
> > >
> > > I updated the wiki, so please take a look.  Note that this new API
> > > requires JDK8.  It seems like JDK8 is coming soon, though, and the
> > > disadvantages of sticking to Java 7 are pretty big here, I think.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Feb 13, 2017, at 11:51, Colin McCabe wrote:
> > > > On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote:
> > > > > Hey Colin,
> > > > >
> > > > > Thanks for the hard work on this. I know going back and forth on APIs
> > > is
> > > > > kind of frustrating but we're at the point where these things live 
> > > > > long
> > > > > enough and are used by enough people that it is worth the pain. I'm
> > > sure
> > > > > it'll come down in the right place eventually. A couple things I've
> > > found
> > > > > helped in the past:
> > > > >
> > > > >    1. The burden of evidence needs to fall on the complicator. i.e. if
> > > > >    person X thinks the api should be async they need to produce a set
> > > of
> > > > >    common use cases that require this. Otherwise you are perpetually
> > > > >    having to
> > > > >    think "we might need x". I think it is good to have a rule of
> > > "simple
> > > > >    until
> > > > >    proven insufficient".
> > > > >    2. Make sure we frame things for the intended audience. At this
> > > point
> > > > >    our apis get used by a very broad set of Java engineers. This is a
> > > > >    very
> > > > >    different audience from our developer mailing list. These people
> > > code
> > > > >    for a
> > > > >    living not necessarily as a passion, and may not understand details
> > > of
> > > > >    the
> > > > >    internals of our system or even basic things like multi-threaded
> > > > >    programming. I don't think this means we want to dumb things down,
> > > but
> > > > >    rather try really hard to make things truly simple when possible.
> > > > >
> > > > > Okay here were a couple of comments:
> > > > >
> > > > >    1. Conceptually what is a TopicContext? I think it means something
> > > > >    like
> > > > >    TopicAdmin? It is not literally context about Topics right? What is
> > > > >    the
> > > > >    relationship of Contexts to clients? Is there a threadsafety
> > > > >    difference?
> > > > >    Would be nice to not have to think about this, this is what I mean
> > > by
> > > > >    "conceptual weight". We introduce a new concept that is a bit
> > > nebulous
> > > > >    that
> > > > >    I have to figure out to use what could be a simple api. I'm sure
> > > > >    you've
> > > > >    been through this experience before where you have these various
> > > > >    objects
> > > > >    and you're trying to figure out what they represent (the connection
> > > to
> > > > >    the
> > > > >    server? the information to create a connection? a request 
> > > > > session?).
> > > >
> > > > The intention was to provide some grouping of methods, and also a place
> > > > to put request parameters which were often set to defaults rather than
> > > > being explicitly set.  If it seems complex, we can certainly get rid of
> > > > it.
> > > >
> > > > >    2. We've tried to avoid the Impl naming convention. In general the
> > > > >    rule
> > > > >    has been if there is only going to be one implementation you don't
> > > > >    need an
> > > > >    interface. If there will be multiple, distinguish it from the
> > > others.
> > > > >    The
> > > > >    other clients follow this pattern: Producer, KafkaProducer,
> > > > >    MockProducer;
> > > > >    Consumer, KafkaConsumer, MockConsumer.
> > > >
> > > > Good point.  Let's change the interface to KafkaAdminClient, and the
> > > > implementation to AdminClient.
> > > >
> > > > >    3. We generally don't use setters or getters as a naming
> > > convention. I
> > > > >    personally think mutating the setting in place seems kind of like
> > > late
> > > > >    90s
> > > > >    Java style. I think it likely has thread-safety issues. i.e. even 
> > > > > if
> > > > >    it is
> > > > >    volatile you may not get the value you just set if there is another
> > > > >    thread... I actually really liked what you described as your
> > > original
> > > > >    idea
> > > > >    of having a single parameter object like CreateTopicRequest that
> > > holds
> > > > >    all
> > > > >    these parameters and defaults. This lets you evolve the api with 
> > > > > all
> > > > >    the
> > > > >    various combinations of arguments without overloading insanity.
> > > After
> > > > >    doing
> > > > >    literally tens of thousands of remote APIs at LinkedIn we 
> > > > > eventually
> > > > >    converged on a rule, which is ultimately every remote api needs a
> > > > >    single
> > > > >    argument object you can add to over time and it must be batched.
> > > Which
> > > > >    brings me to my next point...
> > > >
> > > > Just to clarify, volatiles were never a part of the proposal.  I think
> > > > that context objects or request objects should be used by a single
> > > > thread at a time.
> > > >
> > > > I'm not opposed to request objects, but I think they raise all the same
> > > > questions as context objects.  Basically, the thread-safety issues need
> > > > to be spelled out and understood by the user, and the user needs more
> > > > lines of code to make a request.  And there will be people trying to do
> > > > things like re-use request objects when they should not, and so forth.
> > > >
> > > > >    4. I agree batch apis are annoying but I suspect we'll end up 
> > > > > adding
> > > > >    one. Doing 1000 requests for 1000 operations if creating or 
> > > > > deleting
> > > > >    will
> > > > >    be bad, right? This won't be the common case, but when you do it it
> > > > >    will be
> > > > >    a deal-breaker problem. I don't think we should try to fix this one
> > > > >    behind
> > > > >    the scenes.
> > > > >    5. Are we going to do CompletableFuture (which requires java 8) or
> > > > >    normal Future? Normal Future is utterly useless for most things
> > > other
> > > > >    than
> > > > >    just calling wait. If we can evolve in place from Future to
> > > > >    CompletableFuture that is fantastic (we could do it for the 
> > > > > producer
> > > > >    too!).
> > > > >    My belief was that this was binary incompatible but I actually 
> > > > > don't
> > > > >    know
> > > > >    (obviously it's source compatible).
> > > >
> > > > In my testing, replacing a return type with a subclass of that return
> > > > type did not break binary compatibility.  I haven't been able to find
> > > > chapter and verse on this from the Java implementers, though.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I made some major revisions to the proposal on the wiki, so please
> > > check
> > > > > > it out.
> > > > > >
> > > > > > The new API is based on Ismael's suggestion of grouping related 
> > > > > > APIs.
> > > > > > There is only one layer of grouping.  I think that it's actually
> > > pretty
> > > > > > intuitive.  It's also based on the idea of using Futures, which
> > > several
> > > > > > people commented that they'd like to see.
> > > > > >
> > > > > > Here's a simple example:
> > > > > >
> > > > > >  > AdminClient client = new AdminClientImpl(myConfig);
> > > > > >  > try {
> > > > > >  >   client.topics().create("foo", 3, (short) 2, false).get();
> > > > > >  >   Collection<String> topicNames = client.topics().list(false).
> > > get();
> > > > > >  >   log.info("Found topics: {}", Utils.mkString(topicNames, ", "));
> > > > > >  >   Collection<Node> nodes = client.nodes().list().get();
> > > > > >  >   log.info("Found cluster nodes: {}", Utils.mkString(nodes, ",
> > > "));
> > > > > >  > } finally {
> > > > > >  >   client.close();
> > > > > >  > }
> > > > > >
> > > > > > The good thing is, there is no Try, no 'get' prefixes, no messing
> > > with
> > > > > > batch APIs.  If there is an error, then Future#get() throws an
> > > > > > ExecutionException which wraps the relevant exception in the 
> > > > > > standard
> > > > > > Java way.
> > > > > >
> > > > > > Here's a slightly less simple example:
> > > > > >
> > > > > > > AdminClient client = new AdminClientImpl(myConfig);
> > > > > > > try {
> > > > > > >   List<Future<Void>> futures = new LinkedList<>();
> > > > > > >   for (String topicName: myNewTopicNames) {
> > > > > > >     creations.add(client.topics().
> > > > > > >         setClientTimeout(30000).setCreationConfig(myTopicConfig).
> > > > > > >           create(topicName, 3, (short) 2, false));
> > > > > > >   }
> > > > > > >   Futures.waitForAll(futures);
> > > > > > > } finally {
> > > > > > >   client.close();
> > > > > > > }
> > > > > >
> > > > > > I went with Futures because I feel like ought to have some option 
> > > > > > for
> > > > > > doing async.  It's a style of programming that has become a lot more
> > > > > > popular with the rise of Node.js, Twisted python, etc. etc.  Also, 
> > > > > > as
> > > > > > Ismael commented, Java 8 CompletableFuture is going to make Java's
> > > > > > support for fluent async programming a lot stronger by allowing call
> > > > > > chaining and much more.
> > > > > >
> > > > > > If we are going to support async, the simplest thing is just to make
> > > > > > everything return a future and let people call get() if they want to
> > > run
> > > > > > synchronously.  Having a mix of async and sync APIs is just going to
> > > be
> > > > > > confusing and redundant.
> > > > > >
> > > > > > I think we should try to avoid creating single functions that start
> > > > > > multiple requests if we can.  It makes things much uglier.  It means
> > > > > > that you have to have some kind of request class that wraps up the
> > > > > > request the user is trying to create, so that you can handle an
> > > array of
> > > > > > those requests.  The return value has to be something like Map<Node,
> > > > > > Try<Value>> to represent which nodes failed and succeeded.  This is
> > > the
> > > > > > kind of stuff that, in my opinion, makes people scratch their heads.
> > > > > >
> > > > > > If we need to, we can still get some of the efficiency benefits of
> > > batch
> > > > > > APIs by waiting for a millisecond or two before sending out a topic
> > > > > > create() request to see if other create() requests arrive.  If so, 
> > > > > > we
> > > > > > can coalesce them.  It might be better to figure out if this is an
> > > > > > actual performance issue before implementing it, though.
> > > > > >
> > > > > > I think it would be good to get something out there, annotate it as
> > > > > > @Unstable, and get feedback from people building against trunk and
> > > using
> > > > > > it.  We have removed or changed @Unstable APIs in streams before, so
> > > I
> > > > > > don't think we should worry that it will get set in stone
> > > prematurely.
> > > > > > The AdminClient API should get much less developer use than anything
> > > in
> > > > > > streams, so changing an unstable API should be much easier.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote:
> > > > > > > Thanks for elaborating Jay. I totally agree that we have to be 
> > > > > > > very
> > > > > > > careful
> > > > > > > in how we use our complexity budget. Easier said than done when
> > > people
> > > > > > > don't agree on what is complex and what is simple. :) For example,
> > > I
> > > > > > > think
> > > > > > > batch APIs are a significant source of complexity as you have to
> > > do a
> > > > > > > bunch
> > > > > > > of ceremony to group things before sending the request and error
> > > handling
> > > > > > > becomes more complex due to partial failures (things like `Try` or
> > > other
> > > > > > > mechanisms that serve a similar role are then needed).
> > > > > > >
> > > > > > > Maybe a way forward is to write API usage examples to help
> > > validate that
> > > > > > > the suggested API is indeed easy to use.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <[email protected]>
> > > wrote:
> > > > > > >
> > > > > > > > Totally agree on CompletableFuture. Also agree with some of the
> > > rough
> > > > > > edges
> > > > > > > > on the Consumer.
> > > > > > > >
> > > > > > > > I don't have much of a leg to stand on with the splitting vs not
> > > > > > splitting
> > > > > > > > thing, really hard to argue one or the other is better. I guess
> > > the one
> > > > > > > > observation in watching us try to make good public apis over the
> > > years
> > > > > > is I
> > > > > > > > am kind of in favor of a particular kind of simple. In
> > > particular I
> > > > > > think
> > > > > > > > since the bar is sooo high in support and docs and the community
> > > of
> > > > > > users
> > > > > > > > so broad in the range of their capabilities, it makes it so
> > > there is a
> > > > > > lot
> > > > > > > > of value in dead simple interfaces that don't have a lot of
> > > conceptual
> > > > > > > > weight, don't introduce a lot of new classes or concepts or
> > > general
> > > > > > > > patterns that must be understood to use them correctly. So
> > > things like
> > > > > > > > nesting, or the Try class, or async apis, or even just a complex
> > > set of
> > > > > > > > classes representing arguments or return values kind of all 
> > > > > > > > stack
> > > > > > > > conceptual burdens on the user to figure out correct usage. So
> > > like,
> > > > > > for
> > > > > > > > example, the Try class is very elegant and represents a whole
> > > > > > generalized
> > > > > > > > class of possibly completed actions, but the flip side is maybe
> > > I'm
> > > > > > just a
> > > > > > > > working guy who needs to list his kafka topics but doesn't know
> > > Rust,
> > > > > > take
> > > > > > > > pity on me! :-)
> > > > > > > >
> > > > > > > > Nit picking aside, super excited to see us progress on this.
> > > > > > > >
> > > > > > > > -Jay
> > > > > > > >
> > > > > > > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma <[email protected]>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jay,
> > > > > > > > >
> > > > > > > > > Thanks for the feedback. Comments inline.
> > > > > > > > >
> > > > > > > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps <[email protected]>
> > > wrote:
> > > > > > > > > >
> > > > > > > > > >    - I think it would be good to not use "get" as the prefix
> > > for
> > > > > > things
> > > > > > > > > >    making remote calls. We've tried to avoid the java getter
> > > > > > convention
> > > > > > > > > >    entirely (see code style guide), but for remote calls in
> > > > > > particular
> > > > > > > > it
> > > > > > > > > > kind
> > > > > > > > > >    of blurs the line between field access and remote RPC in
> > > a way
> > > > > > that
> > > > > > > > > > leads
> > > > > > > > > >    people to trouble. What about, e.g., fetchAllGroups() vs
> > > > > > > > > getAllGroups().
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Agreed that we should avoid the `get` prefix for remote calls.
> > > There
> > > > > > are
> > > > > > > > a
> > > > > > > > > few possible prefixes for the read operations: list, fetch,
> > > describe.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - I think futures and callbacks are a bit of a pain to
> > > use. I'd
> > > > > > > > second
> > > > > > > > > >    Becket's comment: let's ensure there a common use case
> > > > > > motivating
> > > > > > > > > these
> > > > > > > > > >    that wouldn't be just as easily satisfied with batch
> > > operations
> > > > > > > > (which
> > > > > > > > > > we
> > > > > > > > > >    seem to have at least for some things). In terms of
> > > flexibility
> > > > > > > > > > Callbacks >
> > > > > > > > > >    Futures > Batch Ops but I think in terms of usability it
> > > is the
> > > > > > > > exact
> > > > > > > > > >    opposite so let's make sure we have worked out how the
> > > API will
> > > > > > be
> > > > > > > > > used
> > > > > > > > > >    before deciding. In particular I think java Futures are
> > > often an
> > > > > > > > > >    uncomfortable half-way point since calling get() and
> > > blocking
> > > > > > the
> > > > > > > > > > thread is
> > > > > > > > > >    often not what you want for chaining sequences of
> > > operations in
> > > > > > a
> > > > > > > > > truly
> > > > > > > > > >    async way, so 99% of people just use the future as a way
> > > to
> > > > > > batch
> > > > > > > > > calls.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > We should definitely figure out how the APIs are going to be
> > > used
> > > > > > before
> > > > > > > > > deciding. I agree that callbacks are definitely painful and
> > > there's
> > > > > > > > little
> > > > > > > > > reason to expose them in a modern API unless it's meant to be
> > > very
> > > > > > low
> > > > > > > > > level. When it comes to Futures, I think it's important to
> > > > > > distinguish
> > > > > > > > what
> > > > > > > > > is available in Java 7 and below versus what is available from
> > > Java 8
> > > > > > > > > onwards. CompletableFuture makes it much easier to
> > > compose/chain
> > > > > > > > operations
> > > > > > > > > (in a similar vein to java.util.Stream, our own Streams API,
> > > etc.)
> > > > > > and it
> > > > > > > > > gives you the ability to register callbacks if you really want
> > > to
> > > > > > > > (avoiding
> > > > > > > > > the somewhat odd situation in the Producer where we return a
> > > Future
> > > > > > _and_
> > > > > > > > > allow you to pass a callback).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - Personally I don't think splitting the admin methods up
> > > > > > actually
> > > > > > > > > makes
> > > > > > > > > >    things more usable. It just makes you have to dig through
> > > our
> > > > > > > > > > hierarchy. I
> > > > > > > > > >    think a flat class with a bunch of operations (like the
> > > consumer
> > > > > > > > api)
> > > > > > > > > is
> > > > > > > > > >    probably the easiest for people to grok and find things
> > > on. I am
> > > > > > > > kind
> > > > > > > > > > of a
> > > > > > > > > >    dumb PHP programmer at heart, though.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am not sure it's fair to compare the AdminClient with the
> > > > > > Consumer. The
> > > > > > > > > former has APIs for a bunch of unrelated APIs (topics, ACLs,
> > > configs,
> > > > > > > > > consumer groups, delegation tokens, preferred leader election,
> > > > > > partition
> > > > > > > > > reassignment, etc.) where the latter is pretty specialised.
> > > For each
> > > > > > of
> > > > > > > > the
> > > > > > > > > resources, you may have 3-4 operations, it will get confusing
> > > fast.
> > > > > > Also,
> > > > > > > > > do you really think an API that has one level of grouping will
> > > mean
> > > > > > that
> > > > > > > > > users have to "dig through our hierarchy"? Or are you
> > > concerned that
> > > > > > once
> > > > > > > > > we go in that direction, there is a danger of making the
> > > hierarchy
> > > > > > more
> > > > > > > > > complicated?
> > > > > > > > >
> > > > > > > > > Finally, I am not sure I would use the consumer as an example
> > > of
> > > > > > > > something
> > > > > > > > > that is easy to grok. :) The fact that methods behave pretty
> > > > > > differently
> > > > > > > > > (some are blocking while others only have an effect after
> > > poll) with
> > > > > > no
> > > > > > > > > indication from the type signature or naming convention makes
> > > it
> > > > > > harder,
> > > > > > > > > not easier, to understand.
> > > > > > > > >
> > > > > > > > > Ismael
> > > > > > > > >
> > > > > > > >
> > > > > >
> > >

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Reply via email to