Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Colin McCabe Thu, 09 Feb 2017 10:51:29 -0800

On Wed, Feb 8, 2017, at 19:02, Dong Lin wrote:
> I am not aware of any semantics that will be caused by sharing
> NetworkClient between producer/consumer and AdminClient. But I agree that
> there is currently no good way to share such an internal class between
> them. And yes, goal is to reduce number of connections. For example, say
> we
> want to enable auto data purge based on committed offset using
> AdminClient.purgeDataBefore(...) in a stream processing application, then
> in addition to producer or consumer, we will now have AdminClient in
> every
> job. It means that the the number of connection between server and client
> will double.
> 
> I have another comment on the KIP. Is AdminClient API supposed to be
> thread
> safe?


Yes.  The wiki page states: "The client will be multi-threaded; multiple
threads will be able to safely make calls using the same AdminClient
object."

> If so, should we mark private variables such as clientTimeoutMs to
> be @volatile? Would it be a concern if two threads call
> TopicsContext.setServerTimeout(...) concurrently to use different timeout
> for their own use-case?

The context objects are extremely lightweight and are not intended to be
shared between multiple threads.  So each thread would just do
client.topics().setClientTimeout(...).create(), and operate on its own
TopicsContext.  I will add that to the wiki.

best,
Colin

> 
> Thanks,
> Dong
> 
> On Wed, Feb 8, 2017 at 6:50 PM, Jason Gustafson <[email protected]>
> wrote:
> 
> > I'm not too sure sharing NetworkClient is a good idea. The consumer and the
> > producer both have request semantics which would be more difficult to
> > reason about if the connections are shared with another client. Also, the
> > NetworkClient is an internal class which is not really meant for users. Do
> > we really want to open that up? Is the only benefit saving the number of
> > connections? Seems not worth it in my opinion.
> >
> > -Jason
> >
> > On Wed, Feb 8, 2017 at 6:43 PM, Dong Lin <[email protected]> wrote:
> >
> > > BTW, the idea to share NetworkClient is suggested by Radai and I like
> > this
> > > idea.
> > >
> > > On Wed, Feb 8, 2017 at 6:39 PM, Dong Lin <[email protected]> wrote:
> > >
> > > > Hey Colin,
> > > >
> > > > Thanks for updating the KIP. I have two followup questions:
> > > >
> > > > - It seems that setCreationConfig(...) is a bit redundant given that
> > most
> > > > arguments (e.g. topic name, partition num) are already passed to
> > > > TopicsContext.create(...) when user creates topic. Should we pass
> > > > the creationConfig as a parameter to TopicsContext.create(..)?
> > > >
> > > > - I am wondering if we should also specify the constructor of the
> > > > AdminClient in the KIP. Previously we agreed that AdminClient should
> > have
> > > > its own thread to poll NetworkClient to send/receive messages. Should
> > we
> > > > also allow AdminClient to use an existing NetworkClient that is
> > provided
> > > to
> > > > the constructor? This would allow AdminClient to share NetworkClient
> > with
> > > > producer or consumer in order to reduce the total number of open
> > sockets
> > > on
> > > > both client and server.
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe <[email protected]>
> > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I made some major revisions to the proposal on the wiki, so please
> > check
> > > >> it out.
> > > >>
> > > >> The new API is based on Ismael's suggestion of grouping related APIs.
> > > >> There is only one layer of grouping.  I think that it's actually
> > pretty
> > > >> intuitive.  It's also based on the idea of using Futures, which
> > several
> > > >> people commented that they'd like to see.
> > > >>
> > > >> Here's a simple example:
> > > >>
> > > >>  > AdminClient client = new AdminClientImpl(myConfig);
> > > >>  > try {
> > > >>  >   client.topics().create("foo", 3, (short) 2, false).get();
> > > >>  >   Collection<String> topicNames = client.topics().list(false).
> > get();
> > > >>  >   log.info("Found topics: {}", Utils.mkString(topicNames, ", "));
> > > >>  >   Collection<Node> nodes = client.nodes().list().get();
> > > >>  >   log.info("Found cluster nodes: {}", Utils.mkString(nodes, ",
> > "));
> > > >>  > } finally {
> > > >>  >   client.close();
> > > >>  > }
> > > >>
> > > >> The good thing is, there is no Try, no 'get' prefixes, no messing with
> > > >> batch APIs.  If there is an error, then Future#get() throws an
> > > >> ExecutionException which wraps the relevant exception in the standard
> > > >> Java way.
> > > >>
> > > >> Here's a slightly less simple example:
> > > >>
> > > >> > AdminClient client = new AdminClientImpl(myConfig);
> > > >> > try {
> > > >> >   List<Future<Void>> futures = new LinkedList<>();
> > > >> >   for (String topicName: myNewTopicNames) {
> > > >> >     creations.add(client.topics().
> > > >> >         setClientTimeout(30000).setCreationConfig(myTopicConfig).
> > > >> >           create(topicName, 3, (short) 2, false));
> > > >> >   }
> > > >> >   Futures.waitForAll(futures);
> > > >> > } finally {
> > > >> >   client.close();
> > > >> > }
> > > >>
> > > >> I went with Futures because I feel like ought to have some option for
> > > >> doing async.  It's a style of programming that has become a lot more
> > > >> popular with the rise of Node.js, Twisted python, etc. etc.  Also, as
> > > >> Ismael commented, Java 8 CompletableFuture is going to make Java's
> > > >> support for fluent async programming a lot stronger by allowing call
> > > >> chaining and much more.
> > > >>
> > > >> If we are going to support async, the simplest thing is just to make
> > > >> everything return a future and let people call get() if they want to
> > run
> > > >> synchronously.  Having a mix of async and sync APIs is just going to
> > be
> > > >> confusing and redundant.
> > > >>
> > > >> I think we should try to avoid creating single functions that start
> > > >> multiple requests if we can.  It makes things much uglier.  It means
> > > >> that you have to have some kind of request class that wraps up the
> > > >> request the user is trying to create, so that you can handle an array
> > of
> > > >> those requests.  The return value has to be something like Map<Node,
> > > >> Try<Value>> to represent which nodes failed and succeeded.  This is
> > the
> > > >> kind of stuff that, in my opinion, makes people scratch their heads.
> > > >>
> > > >> If we need to, we can still get some of the efficiency benefits of
> > batch
> > > >> APIs by waiting for a millisecond or two before sending out a topic
> > > >> create() request to see if other create() requests arrive.  If so, we
> > > >> can coalesce them.  It might be better to figure out if this is an
> > > >> actual performance issue before implementing it, though.
> > > >>
> > > >> I think it would be good to get something out there, annotate it as
> > > >> @Unstable, and get feedback from people building against trunk and
> > using
> > > >> it.  We have removed or changed @Unstable APIs in streams before, so I
> > > >> don't think we should worry that it will get set in stone prematurely.
> > > >> The AdminClient API should get much less developer use than anything
> > in
> > > >> streams, so changing an unstable API should be much easier.
> > > >>
> > > >> best,
> > > >> Colin
> > > >>
> > > >>
> > > >> On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote:
> > > >> > Thanks for elaborating Jay. I totally agree that we have to be very
> > > >> > careful
> > > >> > in how we use our complexity budget. Easier said than done when
> > people
> > > >> > don't agree on what is complex and what is simple. :) For example, I
> > > >> > think
> > > >> > batch APIs are a significant source of complexity as you have to do
> > a
> > > >> > bunch
> > > >> > of ceremony to group things before sending the request and error
> > > >> handling
> > > >> > becomes more complex due to partial failures (things like `Try` or
> > > other
> > > >> > mechanisms that serve a similar role are then needed).
> > > >> >
> > > >> > Maybe a way forward is to write API usage examples to help validate
> > > that
> > > >> > the suggested API is indeed easy to use.
> > > >> >
> > > >> > Ismael
> > > >> >
> > > >> > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <[email protected]> wrote:
> > > >> >
> > > >> > > Totally agree on CompletableFuture. Also agree with some of the
> > > rough
> > > >> edges
> > > >> > > on the Consumer.
> > > >> > >
> > > >> > > I don't have much of a leg to stand on with the splitting vs not
> > > >> splitting
> > > >> > > thing, really hard to argue one or the other is better. I guess
> > the
> > > >> one
> > > >> > > observation in watching us try to make good public apis over the
> > > >> years is I
> > > >> > > am kind of in favor of a particular kind of simple. In particular
> > I
> > > >> think
> > > >> > > since the bar is sooo high in support and docs and the community
> > of
> > > >> users
> > > >> > > so broad in the range of their capabilities, it makes it so there
> > is
> > > >> a lot
> > > >> > > of value in dead simple interfaces that don't have a lot of
> > > conceptual
> > > >> > > weight, don't introduce a lot of new classes or concepts or
> > general
> > > >> > > patterns that must be understood to use them correctly. So things
> > > like
> > > >> > > nesting, or the Try class, or async apis, or even just a complex
> > set
> > > >> of
> > > >> > > classes representing arguments or return values kind of all stack
> > > >> > > conceptual burdens on the user to figure out correct usage. So
> > like,
> > > >> for
> > > >> > > example, the Try class is very elegant and represents a whole
> > > >> generalized
> > > >> > > class of possibly completed actions, but the flip side is maybe
> > I'm
> > > >> just a
> > > >> > > working guy who needs to list his kafka topics but doesn't know
> > > Rust,
> > > >> take
> > > >> > > pity on me! :-)
> > > >> > >
> > > >> > > Nit picking aside, super excited to see us progress on this.
> > > >> > >
> > > >> > > -Jay
> > > >> > >
> > > >> > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma <[email protected]>
> > > wrote:
> > > >> > >
> > > >> > > > Hi Jay,
> > > >> > > >
> > > >> > > > Thanks for the feedback. Comments inline.
> > > >> > > >
> > > >> > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps <[email protected]>
> > > wrote:
> > > >> > > > >
> > > >> > > > >    - I think it would be good to not use "get" as the prefix
> > for
> > > >> things
> > > >> > > > >    making remote calls. We've tried to avoid the java getter
> > > >> convention
> > > >> > > > >    entirely (see code style guide), but for remote calls in
> > > >> particular
> > > >> > > it
> > > >> > > > > kind
> > > >> > > > >    of blurs the line between field access and remote RPC in a
> > > way
> > > >> that
> > > >> > > > > leads
> > > >> > > > >    people to trouble. What about, e.g., fetchAllGroups() vs
> > > >> > > > getAllGroups().
> > > >> > > > >
> > > >> > > >
> > > >> > > > Agreed that we should avoid the `get` prefix for remote calls.
> > > >> There are
> > > >> > > a
> > > >> > > > few possible prefixes for the read operations: list, fetch,
> > > >> describe.
> > > >> > > >
> > > >> > > >
> > > >> > > > >    - I think futures and callbacks are a bit of a pain to use.
> > > I'd
> > > >> > > second
> > > >> > > > >    Becket's comment: let's ensure there a common use case
> > > >> motivating
> > > >> > > > these
> > > >> > > > >    that wouldn't be just as easily satisfied with batch
> > > operations
> > > >> > > (which
> > > >> > > > > we
> > > >> > > > >    seem to have at least for some things). In terms of
> > > flexibility
> > > >> > > > > Callbacks >
> > > >> > > > >    Futures > Batch Ops but I think in terms of usability it is
> > > the
> > > >> > > exact
> > > >> > > > >    opposite so let's make sure we have worked out how the API
> > > >> will be
> > > >> > > > used
> > > >> > > > >    before deciding. In particular I think java Futures are
> > often
> > > >> an
> > > >> > > > >    uncomfortable half-way point since calling get() and
> > blocking
> > > >> the
> > > >> > > > > thread is
> > > >> > > > >    often not what you want for chaining sequences of
> > operations
> > > >> in a
> > > >> > > > truly
> > > >> > > > >    async way, so 99% of people just use the future as a way to
> > > >> batch
> > > >> > > > calls.
> > > >> > > > >
> > > >> > > >
> > > >> > > > We should definitely figure out how the APIs are going to be
> > used
> > > >> before
> > > >> > > > deciding. I agree that callbacks are definitely painful and
> > > there's
> > > >> > > little
> > > >> > > > reason to expose them in a modern API unless it's meant to be
> > very
> > > >> low
> > > >> > > > level. When it comes to Futures, I think it's important to
> > > >> distinguish
> > > >> > > what
> > > >> > > > is available in Java 7 and below versus what is available from
> > > Java
> > > >> 8
> > > >> > > > onwards. CompletableFuture makes it much easier to compose/chain
> > > >> > > operations
> > > >> > > > (in a similar vein to java.util.Stream, our own Streams API,
> > etc.)
> > > >> and it
> > > >> > > > gives you the ability to register callbacks if you really want
> > to
> > > >> > > (avoiding
> > > >> > > > the somewhat odd situation in the Producer where we return a
> > > Future
> > > >> _and_
> > > >> > > > allow you to pass a callback).
> > > >> > > >
> > > >> > > >
> > > >> > > > >    - Personally I don't think splitting the admin methods up
> > > >> actually
> > > >> > > > makes
> > > >> > > > >    things more usable. It just makes you have to dig through
> > our
> > > >> > > > > hierarchy. I
> > > >> > > > >    think a flat class with a bunch of operations (like the
> > > >> consumer
> > > >> > > api)
> > > >> > > > is
> > > >> > > > >    probably the easiest for people to grok and find things
> > on. I
> > > >> am
> > > >> > > kind
> > > >> > > > > of a
> > > >> > > > >    dumb PHP programmer at heart, though.
> > > >> > > > >
> > > >> > > >
> > > >> > > > I am not sure it's fair to compare the AdminClient with the
> > > >> Consumer. The
> > > >> > > > former has APIs for a bunch of unrelated APIs (topics, ACLs,
> > > >> configs,
> > > >> > > > consumer groups, delegation tokens, preferred leader election,
> > > >> partition
> > > >> > > > reassignment, etc.) where the latter is pretty specialised. For
> > > >> each of
> > > >> > > the
> > > >> > > > resources, you may have 3-4 operations, it will get confusing
> > > fast.
> > > >> Also,
> > > >> > > > do you really think an API that has one level of grouping will
> > > mean
> > > >> that
> > > >> > > > users have to "dig through our hierarchy"? Or are you concerned
> > > >> that once
> > > >> > > > we go in that direction, there is a danger of making the
> > hierarchy
> > > >> more
> > > >> > > > complicated?
> > > >> > > >
> > > >> > > > Finally, I am not sure I would use the consumer as an example of
> > > >> > > something
> > > >> > > > that is easy to grok. :) The fact that methods behave pretty
> > > >> differently
> > > >> > > > (some are blocking while others only have an effect after poll)
> > > >> with no
> > > >> > > > indication from the type signature or naming convention makes it
> > > >> harder,
> > > >> > > > not easier, to understand.
> > > >> > > >
> > > >> > > > Ismael
> > > >> > > >
> > > >> > >
> > > >>
> > > >
> > > >
> > >
> >

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Reply via email to