Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Dong Lin Fri, 10 Feb 2017 17:06:21 -0800

Hi Jun,

Currently KIP-107 uses this API:


Future<Map<TopicPartition, PurgeDataResult>>
purgeDataBefore(Map<TopicPartition, Long> offsetForPartition)

Are you suggesting that we should provide this:

Future<PurgeDataResult> purgeDataBefore(TopicPartition, Long)

I think the second solution works and the resulting implementation of
KIP-107 will be simpler. The only concern is that this will be a bit
inconvenience to use. User will need to have a "for" loop in their own code
to call purgeDataBefore for every partition, get results for every
partition, and aggregate results.

KIP-117 currently only provides topic management operation and it is
reasonable to call create(..) once for every topic because user generally
creates one topic a time. But operations such as purgeDataBefore(..) seems
different in the sense that user generally wants to purge data of all
partitions of a topic.

I can change change KIP-107 to use the second API if we decide to avoid
batch API in AdminClient in the long term with the awareness that user
needs to do a bit extra work. Maybe we can minimize this extra work by
providing a utility class to combine multiple Future objects into one
Future() object.

Thanks,
Dong



On Fri, Feb 10, 2017 at 4:18 PM, Jun Rao <[email protected]> wrote:

> Hi, Dong,
>
> For KIP-107, the purgeDataBefore() api will eventually be added to the
> AdminClient too, right? It would be useful to make the apis consistent.
> Currently, in KIP-107, we do batching in purgeDataBefore(). In Colin's
> current proposal, there is no batching.
>
> Thanks,
>
> Jun
>
> On Thu, Feb 9, 2017 at 10:54 AM, Dong Lin <[email protected]> wrote:
>
> > Thanks for the explanation. This makes sense.
> >
> > Best,
> > Dong
> >
> > On Thu, Feb 9, 2017 at 10:51 AM, Colin McCabe <[email protected]>
> wrote:
> >
> > > On Wed, Feb 8, 2017, at 19:02, Dong Lin wrote:
> > > > I am not aware of any semantics that will be caused by sharing
> > > > NetworkClient between producer/consumer and AdminClient. But I agree
> > that
> > > > there is currently no good way to share such an internal class
> between
> > > > them. And yes, goal is to reduce number of connections. For example,
> > say
> > > > we
> > > > want to enable auto data purge based on committed offset using
> > > > AdminClient.purgeDataBefore(...) in a stream processing application,
> > > then
> > > > in addition to producer or consumer, we will now have AdminClient in
> > > > every
> > > > job. It means that the the number of connection between server and
> > client
> > > > will double.
> > > >
> > > > I have another comment on the KIP. Is AdminClient API supposed to be
> > > > thread
> > > > safe?
> > >
> > > Yes.  The wiki page states: "The client will be multi-threaded;
> multiple
> > > threads will be able to safely make calls using the same AdminClient
> > > object."
> > >
> > > > If so, should we mark private variables such as clientTimeoutMs to
> > > > be @volatile? Would it be a concern if two threads call
> > > > TopicsContext.setServerTimeout(...) concurrently to use different
> > > timeout
> > > > for their own use-case?
> > >
> > > The context objects are extremely lightweight and are not intended to
> be
> > > shared between multiple threads.  So each thread would just do
> > > client.topics().setClientTimeout(...).create(), and operate on its own
> > > TopicsContext.  I will add that to the wiki.
> > >
> > > best,
> > > Colin
> > >
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > > On Wed, Feb 8, 2017 at 6:50 PM, Jason Gustafson <[email protected]>
> > > > wrote:
> > > >
> > > > > I'm not too sure sharing NetworkClient is a good idea. The consumer
> > > and the
> > > > > producer both have request semantics which would be more difficult
> to
> > > > > reason about if the connections are shared with another client.
> Also,
> > > the
> > > > > NetworkClient is an internal class which is not really meant for
> > > users. Do
> > > > > we really want to open that up? Is the only benefit saving the
> number
> > > of
> > > > > connections? Seems not worth it in my opinion.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Wed, Feb 8, 2017 at 6:43 PM, Dong Lin <[email protected]>
> > wrote:
> > > > >
> > > > > > BTW, the idea to share NetworkClient is suggested by Radai and I
> > like
> > > > > this
> > > > > > idea.
> > > > > >
> > > > > > On Wed, Feb 8, 2017 at 6:39 PM, Dong Lin <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > Hey Colin,
> > > > > > >
> > > > > > > Thanks for updating the KIP. I have two followup questions:
> > > > > > >
> > > > > > > - It seems that setCreationConfig(...) is a bit redundant given
> > > that
> > > > > most
> > > > > > > arguments (e.g. topic name, partition num) are already passed
> to
> > > > > > > TopicsContext.create(...) when user creates topic. Should we
> pass
> > > > > > > the creationConfig as a parameter to TopicsContext.create(..)?
> > > > > > >
> > > > > > > - I am wondering if we should also specify the constructor of
> the
> > > > > > > AdminClient in the KIP. Previously we agreed that AdminClient
> > > should
> > > > > have
> > > > > > > its own thread to poll NetworkClient to send/receive messages.
> > > Should
> > > > > we
> > > > > > > also allow AdminClient to use an existing NetworkClient that is
> > > > > provided
> > > > > > to
> > > > > > > the constructor? This would allow AdminClient to share
> > > NetworkClient
> > > > > with
> > > > > > > producer or consumer in order to reduce the total number of
> open
> > > > > sockets
> > > > > > on
> > > > > > > both client and server.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dong
> > > > > > >
> > > > > > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> I made some major revisions to the proposal on the wiki, so
> > please
> > > > > check
> > > > > > >> it out.
> > > > > > >>
> > > > > > >> The new API is based on Ismael's suggestion of grouping
> related
> > > APIs.
> > > > > > >> There is only one layer of grouping.  I think that it's
> actually
> > > > > pretty
> > > > > > >> intuitive.  It's also based on the idea of using Futures,
> which
> > > > > several
> > > > > > >> people commented that they'd like to see.
> > > > > > >>
> > > > > > >> Here's a simple example:
> > > > > > >>
> > > > > > >>  > AdminClient client = new AdminClientImpl(myConfig);
> > > > > > >>  > try {
> > > > > > >>  >   client.topics().create("foo", 3, (short) 2, false).get();
> > > > > > >>  >   Collection<String> topicNames =
> > client.topics().list(false).
> > > > > get();
> > > > > > >>  >   log.info("Found topics: {}", Utils.mkString(topicNames,
> ",
> > > "));
> > > > > > >>  >   Collection<Node> nodes = client.nodes().list().get();
> > > > > > >>  >   log.info("Found cluster nodes: {}",
> Utils.mkString(nodes,
> > ",
> > > > > "));
> > > > > > >>  > } finally {
> > > > > > >>  >   client.close();
> > > > > > >>  > }
> > > > > > >>
> > > > > > >> The good thing is, there is no Try, no 'get' prefixes, no
> > messing
> > > with
> > > > > > >> batch APIs.  If there is an error, then Future#get() throws an
> > > > > > >> ExecutionException which wraps the relevant exception in the
> > > standard
> > > > > > >> Java way.
> > > > > > >>
> > > > > > >> Here's a slightly less simple example:
> > > > > > >>
> > > > > > >> > AdminClient client = new AdminClientImpl(myConfig);
> > > > > > >> > try {
> > > > > > >> >   List<Future<Void>> futures = new LinkedList<>();
> > > > > > >> >   for (String topicName: myNewTopicNames) {
> > > > > > >> >     creations.add(client.topics().
> > > > > > >> >         setClientTimeout(30000).setCreationConfig(
> > > myTopicConfig).
> > > > > > >> >           create(topicName, 3, (short) 2, false));
> > > > > > >> >   }
> > > > > > >> >   Futures.waitForAll(futures);
> > > > > > >> > } finally {
> > > > > > >> >   client.close();
> > > > > > >> > }
> > > > > > >>
> > > > > > >> I went with Futures because I feel like ought to have some
> > option
> > > for
> > > > > > >> doing async.  It's a style of programming that has become a
> lot
> > > more
> > > > > > >> popular with the rise of Node.js, Twisted python, etc. etc.
> > > Also, as
> > > > > > >> Ismael commented, Java 8 CompletableFuture is going to make
> > Java's
> > > > > > >> support for fluent async programming a lot stronger by
> allowing
> > > call
> > > > > > >> chaining and much more.
> > > > > > >>
> > > > > > >> If we are going to support async, the simplest thing is just
> to
> > > make
> > > > > > >> everything return a future and let people call get() if they
> > want
> > > to
> > > > > run
> > > > > > >> synchronously.  Having a mix of async and sync APIs is just
> > going
> > > to
> > > > > be
> > > > > > >> confusing and redundant.
> > > > > > >>
> > > > > > >> I think we should try to avoid creating single functions that
> > > start
> > > > > > >> multiple requests if we can.  It makes things much uglier.  It
> > > means
> > > > > > >> that you have to have some kind of request class that wraps up
> > the
> > > > > > >> request the user is trying to create, so that you can handle
> an
> > > array
> > > > > of
> > > > > > >> those requests.  The return value has to be something like
> > > Map<Node,
> > > > > > >> Try<Value>> to represent which nodes failed and succeeded.
> This
> > > is
> > > > > the
> > > > > > >> kind of stuff that, in my opinion, makes people scratch their
> > > heads.
> > > > > > >>
> > > > > > >> If we need to, we can still get some of the efficiency
> benefits
> > of
> > > > > batch
> > > > > > >> APIs by waiting for a millisecond or two before sending out a
> > > topic
> > > > > > >> create() request to see if other create() requests arrive.  If
> > > so, we
> > > > > > >> can coalesce them.  It might be better to figure out if this
> is
> > an
> > > > > > >> actual performance issue before implementing it, though.
> > > > > > >>
> > > > > > >> I think it would be good to get something out there, annotate
> it
> > > as
> > > > > > >> @Unstable, and get feedback from people building against trunk
> > and
> > > > > using
> > > > > > >> it.  We have removed or changed @Unstable APIs in streams
> > before,
> > > so I
> > > > > > >> don't think we should worry that it will get set in stone
> > > prematurely.
> > > > > > >> The AdminClient API should get much less developer use than
> > > anything
> > > > > in
> > > > > > >> streams, so changing an unstable API should be much easier.
> > > > > > >>
> > > > > > >> best,
> > > > > > >> Colin
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote:
> > > > > > >> > Thanks for elaborating Jay. I totally agree that we have to
> be
> > > very
> > > > > > >> > careful
> > > > > > >> > in how we use our complexity budget. Easier said than done
> > when
> > > > > people
> > > > > > >> > don't agree on what is complex and what is simple. :) For
> > > example, I
> > > > > > >> > think
> > > > > > >> > batch APIs are a significant source of complexity as you
> have
> > > to do
> > > > > a
> > > > > > >> > bunch
> > > > > > >> > of ceremony to group things before sending the request and
> > error
> > > > > > >> handling
> > > > > > >> > becomes more complex due to partial failures (things like
> > `Try`
> > > or
> > > > > > other
> > > > > > >> > mechanisms that serve a similar role are then needed).
> > > > > > >> >
> > > > > > >> > Maybe a way forward is to write API usage examples to help
> > > validate
> > > > > > that
> > > > > > >> > the suggested API is indeed easy to use.
> > > > > > >> >
> > > > > > >> > Ismael
> > > > > > >> >
> > > > > > >> > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <[email protected]
> >
> > > wrote:
> > > > > > >> >
> > > > > > >> > > Totally agree on CompletableFuture. Also agree with some
> of
> > > the
> > > > > > rough
> > > > > > >> edges
> > > > > > >> > > on the Consumer.
> > > > > > >> > >
> > > > > > >> > > I don't have much of a leg to stand on with the splitting
> vs
> > > not
> > > > > > >> splitting
> > > > > > >> > > thing, really hard to argue one or the other is better. I
> > > guess
> > > > > the
> > > > > > >> one
> > > > > > >> > > observation in watching us try to make good public apis
> over
> > > the
> > > > > > >> years is I
> > > > > > >> > > am kind of in favor of a particular kind of simple. In
> > > particular
> > > > > I
> > > > > > >> think
> > > > > > >> > > since the bar is sooo high in support and docs and the
> > > community
> > > > > of
> > > > > > >> users
> > > > > > >> > > so broad in the range of their capabilities, it makes it
> so
> > > there
> > > > > is
> > > > > > >> a lot
> > > > > > >> > > of value in dead simple interfaces that don't have a lot
> of
> > > > > > conceptual
> > > > > > >> > > weight, don't introduce a lot of new classes or concepts
> or
> > > > > general
> > > > > > >> > > patterns that must be understood to use them correctly. So
> > > things
> > > > > > like
> > > > > > >> > > nesting, or the Try class, or async apis, or even just a
> > > complex
> > > > > set
> > > > > > >> of
> > > > > > >> > > classes representing arguments or return values kind of
> all
> > > stack
> > > > > > >> > > conceptual burdens on the user to figure out correct
> usage.
> > So
> > > > > like,
> > > > > > >> for
> > > > > > >> > > example, the Try class is very elegant and represents a
> > whole
> > > > > > >> generalized
> > > > > > >> > > class of possibly completed actions, but the flip side is
> > > maybe
> > > > > I'm
> > > > > > >> just a
> > > > > > >> > > working guy who needs to list his kafka topics but doesn't
> > > know
> > > > > > Rust,
> > > > > > >> take
> > > > > > >> > > pity on me! :-)
> > > > > > >> > >
> > > > > > >> > > Nit picking aside, super excited to see us progress on
> this.
> > > > > > >> > >
> > > > > > >> > > -Jay
> > > > > > >> > >
> > > > > > >> > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi Jay,
> > > > > > >> > > >
> > > > > > >> > > > Thanks for the feedback. Comments inline.
> > > > > > >> > > >
> > > > > > >> > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > >    - I think it would be good to not use "get" as the
> > > prefix
> > > > > for
> > > > > > >> things
> > > > > > >> > > > >    making remote calls. We've tried to avoid the java
> > > getter
> > > > > > >> convention
> > > > > > >> > > > >    entirely (see code style guide), but for remote
> calls
> > > in
> > > > > > >> particular
> > > > > > >> > > it
> > > > > > >> > > > > kind
> > > > > > >> > > > >    of blurs the line between field access and remote
> RPC
> > > in a
> > > > > > way
> > > > > > >> that
> > > > > > >> > > > > leads
> > > > > > >> > > > >    people to trouble. What about, e.g.,
> fetchAllGroups()
> > > vs
> > > > > > >> > > > getAllGroups().
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > > Agreed that we should avoid the `get` prefix for remote
> > > calls.
> > > > > > >> There are
> > > > > > >> > > a
> > > > > > >> > > > few possible prefixes for the read operations: list,
> > fetch,
> > > > > > >> describe.
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > >    - I think futures and callbacks are a bit of a pain
> > to
> > > use.
> > > > > > I'd
> > > > > > >> > > second
> > > > > > >> > > > >    Becket's comment: let's ensure there a common use
> > case
> > > > > > >> motivating
> > > > > > >> > > > these
> > > > > > >> > > > >    that wouldn't be just as easily satisfied with
> batch
> > > > > > operations
> > > > > > >> > > (which
> > > > > > >> > > > > we
> > > > > > >> > > > >    seem to have at least for some things). In terms of
> > > > > > flexibility
> > > > > > >> > > > > Callbacks >
> > > > > > >> > > > >    Futures > Batch Ops but I think in terms of
> usability
> > > it is
> > > > > > the
> > > > > > >> > > exact
> > > > > > >> > > > >    opposite so let's make sure we have worked out how
> > the
> > > API
> > > > > > >> will be
> > > > > > >> > > > used
> > > > > > >> > > > >    before deciding. In particular I think java Futures
> > are
> > > > > often
> > > > > > >> an
> > > > > > >> > > > >    uncomfortable half-way point since calling get()
> and
> > > > > blocking
> > > > > > >> the
> > > > > > >> > > > > thread is
> > > > > > >> > > > >    often not what you want for chaining sequences of
> > > > > operations
> > > > > > >> in a
> > > > > > >> > > > truly
> > > > > > >> > > > >    async way, so 99% of people just use the future as
> a
> > > way to
> > > > > > >> batch
> > > > > > >> > > > calls.
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > > We should definitely figure out how the APIs are going
> to
> > be
> > > > > used
> > > > > > >> before
> > > > > > >> > > > deciding. I agree that callbacks are definitely painful
> > and
> > > > > > there's
> > > > > > >> > > little
> > > > > > >> > > > reason to expose them in a modern API unless it's meant
> to
> > > be
> > > > > very
> > > > > > >> low
> > > > > > >> > > > level. When it comes to Futures, I think it's important
> to
> > > > > > >> distinguish
> > > > > > >> > > what
> > > > > > >> > > > is available in Java 7 and below versus what is
> available
> > > from
> > > > > > Java
> > > > > > >> 8
> > > > > > >> > > > onwards. CompletableFuture makes it much easier to
> > > compose/chain
> > > > > > >> > > operations
> > > > > > >> > > > (in a similar vein to java.util.Stream, our own Streams
> > API,
> > > > > etc.)
> > > > > > >> and it
> > > > > > >> > > > gives you the ability to register callbacks if you
> really
> > > want
> > > > > to
> > > > > > >> > > (avoiding
> > > > > > >> > > > the somewhat odd situation in the Producer where we
> > return a
> > > > > > Future
> > > > > > >> _and_
> > > > > > >> > > > allow you to pass a callback).
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > >    - Personally I don't think splitting the admin
> > methods
> > > up
> > > > > > >> actually
> > > > > > >> > > > makes
> > > > > > >> > > > >    things more usable. It just makes you have to dig
> > > through
> > > > > our
> > > > > > >> > > > > hierarchy. I
> > > > > > >> > > > >    think a flat class with a bunch of operations (like
> > the
> > > > > > >> consumer
> > > > > > >> > > api)
> > > > > > >> > > > is
> > > > > > >> > > > >    probably the easiest for people to grok and find
> > things
> > > > > on. I
> > > > > > >> am
> > > > > > >> > > kind
> > > > > > >> > > > > of a
> > > > > > >> > > > >    dumb PHP programmer at heart, though.
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > > I am not sure it's fair to compare the AdminClient with
> > the
> > > > > > >> Consumer. The
> > > > > > >> > > > former has APIs for a bunch of unrelated APIs (topics,
> > ACLs,
> > > > > > >> configs,
> > > > > > >> > > > consumer groups, delegation tokens, preferred leader
> > > election,
> > > > > > >> partition
> > > > > > >> > > > reassignment, etc.) where the latter is pretty
> > specialised.
> > > For
> > > > > > >> each of
> > > > > > >> > > the
> > > > > > >> > > > resources, you may have 3-4 operations, it will get
> > > confusing
> > > > > > fast.
> > > > > > >> Also,
> > > > > > >> > > > do you really think an API that has one level of
> grouping
> > > will
> > > > > > mean
> > > > > > >> that
> > > > > > >> > > > users have to "dig through our hierarchy"? Or are you
> > > concerned
> > > > > > >> that once
> > > > > > >> > > > we go in that direction, there is a danger of making the
> > > > > hierarchy
> > > > > > >> more
> > > > > > >> > > > complicated?
> > > > > > >> > > >
> > > > > > >> > > > Finally, I am not sure I would use the consumer as an
> > > example of
> > > > > > >> > > something
> > > > > > >> > > > that is easy to grok. :) The fact that methods behave
> > pretty
> > > > > > >> differently
> > > > > > >> > > > (some are blocking while others only have an effect
> after
> > > poll)
> > > > > > >> with no
> > > > > > >> > > > indication from the type signature or naming convention
> > > makes it
> > > > > > >> harder,
> > > > > > >> > > > not easier, to understand.
> > > > > > >> > > >
> > > > > > >> > > > Ismael
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Reply via email to