Hi Jun, Currently KIP-107 uses this API:
Future<Map<TopicPartition, PurgeDataResult>> purgeDataBefore(Map<TopicPartition, Long> offsetForPartition) Are you suggesting that we should provide this: Future<PurgeDataResult> purgeDataBefore(TopicPartition, Long) I think the second solution works and the resulting implementation of KIP-107 will be simpler. The only concern is that this will be a bit inconvenience to use. User will need to have a "for" loop in their own code to call purgeDataBefore for every partition, get results for every partition, and aggregate results. KIP-117 currently only provides topic management operation and it is reasonable to call create(..) once for every topic because user generally creates one topic a time. But operations such as purgeDataBefore(..) seems different in the sense that user generally wants to purge data of all partitions of a topic. I can change change KIP-107 to use the second API if we decide to avoid batch API in AdminClient in the long term with the awareness that user needs to do a bit extra work. Maybe we can minimize this extra work by providing a utility class to combine multiple Future objects into one Future() object. Thanks, Dong On Fri, Feb 10, 2017 at 4:18 PM, Jun Rao <j...@confluent.io> wrote: > Hi, Dong, > > For KIP-107, the purgeDataBefore() api will eventually be added to the > AdminClient too, right? It would be useful to make the apis consistent. > Currently, in KIP-107, we do batching in purgeDataBefore(). In Colin's > current proposal, there is no batching. > > Thanks, > > Jun > > On Thu, Feb 9, 2017 at 10:54 AM, Dong Lin <lindon...@gmail.com> wrote: > > > Thanks for the explanation. This makes sense. > > > > Best, > > Dong > > > > On Thu, Feb 9, 2017 at 10:51 AM, Colin McCabe <cmcc...@apache.org> > wrote: > > > > > On Wed, Feb 8, 2017, at 19:02, Dong Lin wrote: > > > > I am not aware of any semantics that will be caused by sharing > > > > NetworkClient between producer/consumer and AdminClient. But I agree > > that > > > > there is currently no good way to share such an internal class > between > > > > them. And yes, goal is to reduce number of connections. For example, > > say > > > > we > > > > want to enable auto data purge based on committed offset using > > > > AdminClient.purgeDataBefore(...) in a stream processing application, > > > then > > > > in addition to producer or consumer, we will now have AdminClient in > > > > every > > > > job. It means that the the number of connection between server and > > client > > > > will double. > > > > > > > > I have another comment on the KIP. Is AdminClient API supposed to be > > > > thread > > > > safe? > > > > > > Yes. The wiki page states: "The client will be multi-threaded; > multiple > > > threads will be able to safely make calls using the same AdminClient > > > object." > > > > > > > If so, should we mark private variables such as clientTimeoutMs to > > > > be @volatile? Would it be a concern if two threads call > > > > TopicsContext.setServerTimeout(...) concurrently to use different > > > timeout > > > > for their own use-case? > > > > > > The context objects are extremely lightweight and are not intended to > be > > > shared between multiple threads. So each thread would just do > > > client.topics().setClientTimeout(...).create(), and operate on its own > > > TopicsContext. I will add that to the wiki. > > > > > > best, > > > Colin > > > > > > > > > > > Thanks, > > > > Dong > > > > > > > > On Wed, Feb 8, 2017 at 6:50 PM, Jason Gustafson <ja...@confluent.io> > > > > wrote: > > > > > > > > > I'm not too sure sharing NetworkClient is a good idea. The consumer > > > and the > > > > > producer both have request semantics which would be more difficult > to > > > > > reason about if the connections are shared with another client. > Also, > > > the > > > > > NetworkClient is an internal class which is not really meant for > > > users. Do > > > > > we really want to open that up? Is the only benefit saving the > number > > > of > > > > > connections? Seems not worth it in my opinion. > > > > > > > > > > -Jason > > > > > > > > > > On Wed, Feb 8, 2017 at 6:43 PM, Dong Lin <lindon...@gmail.com> > > wrote: > > > > > > > > > > > BTW, the idea to share NetworkClient is suggested by Radai and I > > like > > > > > this > > > > > > idea. > > > > > > > > > > > > On Wed, Feb 8, 2017 at 6:39 PM, Dong Lin <lindon...@gmail.com> > > > wrote: > > > > > > > > > > > > > Hey Colin, > > > > > > > > > > > > > > Thanks for updating the KIP. I have two followup questions: > > > > > > > > > > > > > > - It seems that setCreationConfig(...) is a bit redundant given > > > that > > > > > most > > > > > > > arguments (e.g. topic name, partition num) are already passed > to > > > > > > > TopicsContext.create(...) when user creates topic. Should we > pass > > > > > > > the creationConfig as a parameter to TopicsContext.create(..)? > > > > > > > > > > > > > > - I am wondering if we should also specify the constructor of > the > > > > > > > AdminClient in the KIP. Previously we agreed that AdminClient > > > should > > > > > have > > > > > > > its own thread to poll NetworkClient to send/receive messages. > > > Should > > > > > we > > > > > > > also allow AdminClient to use an existing NetworkClient that is > > > > > provided > > > > > > to > > > > > > > the constructor? This would allow AdminClient to share > > > NetworkClient > > > > > with > > > > > > > producer or consumer in order to reduce the total number of > open > > > > > sockets > > > > > > on > > > > > > > both client and server. > > > > > > > > > > > > > > Thanks, > > > > > > > Dong > > > > > > > > > > > > > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe < > cmcc...@apache.org > > > > > > > > wrote: > > > > > > > > > > > > > >> Hi all, > > > > > > >> > > > > > > >> I made some major revisions to the proposal on the wiki, so > > please > > > > > check > > > > > > >> it out. > > > > > > >> > > > > > > >> The new API is based on Ismael's suggestion of grouping > related > > > APIs. > > > > > > >> There is only one layer of grouping. I think that it's > actually > > > > > pretty > > > > > > >> intuitive. It's also based on the idea of using Futures, > which > > > > > several > > > > > > >> people commented that they'd like to see. > > > > > > >> > > > > > > >> Here's a simple example: > > > > > > >> > > > > > > >> > AdminClient client = new AdminClientImpl(myConfig); > > > > > > >> > try { > > > > > > >> > client.topics().create("foo", 3, (short) 2, false).get(); > > > > > > >> > Collection<String> topicNames = > > client.topics().list(false). > > > > > get(); > > > > > > >> > log.info("Found topics: {}", Utils.mkString(topicNames, > ", > > > ")); > > > > > > >> > Collection<Node> nodes = client.nodes().list().get(); > > > > > > >> > log.info("Found cluster nodes: {}", > Utils.mkString(nodes, > > ", > > > > > ")); > > > > > > >> > } finally { > > > > > > >> > client.close(); > > > > > > >> > } > > > > > > >> > > > > > > >> The good thing is, there is no Try, no 'get' prefixes, no > > messing > > > with > > > > > > >> batch APIs. If there is an error, then Future#get() throws an > > > > > > >> ExecutionException which wraps the relevant exception in the > > > standard > > > > > > >> Java way. > > > > > > >> > > > > > > >> Here's a slightly less simple example: > > > > > > >> > > > > > > >> > AdminClient client = new AdminClientImpl(myConfig); > > > > > > >> > try { > > > > > > >> > List<Future<Void>> futures = new LinkedList<>(); > > > > > > >> > for (String topicName: myNewTopicNames) { > > > > > > >> > creations.add(client.topics(). > > > > > > >> > setClientTimeout(30000).setCreationConfig( > > > myTopicConfig). > > > > > > >> > create(topicName, 3, (short) 2, false)); > > > > > > >> > } > > > > > > >> > Futures.waitForAll(futures); > > > > > > >> > } finally { > > > > > > >> > client.close(); > > > > > > >> > } > > > > > > >> > > > > > > >> I went with Futures because I feel like ought to have some > > option > > > for > > > > > > >> doing async. It's a style of programming that has become a > lot > > > more > > > > > > >> popular with the rise of Node.js, Twisted python, etc. etc. > > > Also, as > > > > > > >> Ismael commented, Java 8 CompletableFuture is going to make > > Java's > > > > > > >> support for fluent async programming a lot stronger by > allowing > > > call > > > > > > >> chaining and much more. > > > > > > >> > > > > > > >> If we are going to support async, the simplest thing is just > to > > > make > > > > > > >> everything return a future and let people call get() if they > > want > > > to > > > > > run > > > > > > >> synchronously. Having a mix of async and sync APIs is just > > going > > > to > > > > > be > > > > > > >> confusing and redundant. > > > > > > >> > > > > > > >> I think we should try to avoid creating single functions that > > > start > > > > > > >> multiple requests if we can. It makes things much uglier. It > > > means > > > > > > >> that you have to have some kind of request class that wraps up > > the > > > > > > >> request the user is trying to create, so that you can handle > an > > > array > > > > > of > > > > > > >> those requests. The return value has to be something like > > > Map<Node, > > > > > > >> Try<Value>> to represent which nodes failed and succeeded. > This > > > is > > > > > the > > > > > > >> kind of stuff that, in my opinion, makes people scratch their > > > heads. > > > > > > >> > > > > > > >> If we need to, we can still get some of the efficiency > benefits > > of > > > > > batch > > > > > > >> APIs by waiting for a millisecond or two before sending out a > > > topic > > > > > > >> create() request to see if other create() requests arrive. If > > > so, we > > > > > > >> can coalesce them. It might be better to figure out if this > is > > an > > > > > > >> actual performance issue before implementing it, though. > > > > > > >> > > > > > > >> I think it would be good to get something out there, annotate > it > > > as > > > > > > >> @Unstable, and get feedback from people building against trunk > > and > > > > > using > > > > > > >> it. We have removed or changed @Unstable APIs in streams > > before, > > > so I > > > > > > >> don't think we should worry that it will get set in stone > > > prematurely. > > > > > > >> The AdminClient API should get much less developer use than > > > anything > > > > > in > > > > > > >> streams, so changing an unstable API should be much easier. > > > > > > >> > > > > > > >> best, > > > > > > >> Colin > > > > > > >> > > > > > > >> > > > > > > >> On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote: > > > > > > >> > Thanks for elaborating Jay. I totally agree that we have to > be > > > very > > > > > > >> > careful > > > > > > >> > in how we use our complexity budget. Easier said than done > > when > > > > > people > > > > > > >> > don't agree on what is complex and what is simple. :) For > > > example, I > > > > > > >> > think > > > > > > >> > batch APIs are a significant source of complexity as you > have > > > to do > > > > > a > > > > > > >> > bunch > > > > > > >> > of ceremony to group things before sending the request and > > error > > > > > > >> handling > > > > > > >> > becomes more complex due to partial failures (things like > > `Try` > > > or > > > > > > other > > > > > > >> > mechanisms that serve a similar role are then needed). > > > > > > >> > > > > > > > >> > Maybe a way forward is to write API usage examples to help > > > validate > > > > > > that > > > > > > >> > the suggested API is indeed easy to use. > > > > > > >> > > > > > > > >> > Ismael > > > > > > >> > > > > > > > >> > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <j...@confluent.io > > > > > wrote: > > > > > > >> > > > > > > > >> > > Totally agree on CompletableFuture. Also agree with some > of > > > the > > > > > > rough > > > > > > >> edges > > > > > > >> > > on the Consumer. > > > > > > >> > > > > > > > > >> > > I don't have much of a leg to stand on with the splitting > vs > > > not > > > > > > >> splitting > > > > > > >> > > thing, really hard to argue one or the other is better. I > > > guess > > > > > the > > > > > > >> one > > > > > > >> > > observation in watching us try to make good public apis > over > > > the > > > > > > >> years is I > > > > > > >> > > am kind of in favor of a particular kind of simple. In > > > particular > > > > > I > > > > > > >> think > > > > > > >> > > since the bar is sooo high in support and docs and the > > > community > > > > > of > > > > > > >> users > > > > > > >> > > so broad in the range of their capabilities, it makes it > so > > > there > > > > > is > > > > > > >> a lot > > > > > > >> > > of value in dead simple interfaces that don't have a lot > of > > > > > > conceptual > > > > > > >> > > weight, don't introduce a lot of new classes or concepts > or > > > > > general > > > > > > >> > > patterns that must be understood to use them correctly. So > > > things > > > > > > like > > > > > > >> > > nesting, or the Try class, or async apis, or even just a > > > complex > > > > > set > > > > > > >> of > > > > > > >> > > classes representing arguments or return values kind of > all > > > stack > > > > > > >> > > conceptual burdens on the user to figure out correct > usage. > > So > > > > > like, > > > > > > >> for > > > > > > >> > > example, the Try class is very elegant and represents a > > whole > > > > > > >> generalized > > > > > > >> > > class of possibly completed actions, but the flip side is > > > maybe > > > > > I'm > > > > > > >> just a > > > > > > >> > > working guy who needs to list his kafka topics but doesn't > > > know > > > > > > Rust, > > > > > > >> take > > > > > > >> > > pity on me! :-) > > > > > > >> > > > > > > > > >> > > Nit picking aside, super excited to see us progress on > this. > > > > > > >> > > > > > > > > >> > > -Jay > > > > > > >> > > > > > > > > >> > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma < > > ism...@juma.me.uk > > > > > > > > > > wrote: > > > > > > >> > > > > > > > > >> > > > Hi Jay, > > > > > > >> > > > > > > > > > >> > > > Thanks for the feedback. Comments inline. > > > > > > >> > > > > > > > > > >> > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps < > > j...@confluent.io > > > > > > > > > > wrote: > > > > > > >> > > > > > > > > > > >> > > > > - I think it would be good to not use "get" as the > > > prefix > > > > > for > > > > > > >> things > > > > > > >> > > > > making remote calls. We've tried to avoid the java > > > getter > > > > > > >> convention > > > > > > >> > > > > entirely (see code style guide), but for remote > calls > > > in > > > > > > >> particular > > > > > > >> > > it > > > > > > >> > > > > kind > > > > > > >> > > > > of blurs the line between field access and remote > RPC > > > in a > > > > > > way > > > > > > >> that > > > > > > >> > > > > leads > > > > > > >> > > > > people to trouble. What about, e.g., > fetchAllGroups() > > > vs > > > > > > >> > > > getAllGroups(). > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > Agreed that we should avoid the `get` prefix for remote > > > calls. > > > > > > >> There are > > > > > > >> > > a > > > > > > >> > > > few possible prefixes for the read operations: list, > > fetch, > > > > > > >> describe. > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > - I think futures and callbacks are a bit of a pain > > to > > > use. > > > > > > I'd > > > > > > >> > > second > > > > > > >> > > > > Becket's comment: let's ensure there a common use > > case > > > > > > >> motivating > > > > > > >> > > > these > > > > > > >> > > > > that wouldn't be just as easily satisfied with > batch > > > > > > operations > > > > > > >> > > (which > > > > > > >> > > > > we > > > > > > >> > > > > seem to have at least for some things). In terms of > > > > > > flexibility > > > > > > >> > > > > Callbacks > > > > > > > >> > > > > Futures > Batch Ops but I think in terms of > usability > > > it is > > > > > > the > > > > > > >> > > exact > > > > > > >> > > > > opposite so let's make sure we have worked out how > > the > > > API > > > > > > >> will be > > > > > > >> > > > used > > > > > > >> > > > > before deciding. In particular I think java Futures > > are > > > > > often > > > > > > >> an > > > > > > >> > > > > uncomfortable half-way point since calling get() > and > > > > > blocking > > > > > > >> the > > > > > > >> > > > > thread is > > > > > > >> > > > > often not what you want for chaining sequences of > > > > > operations > > > > > > >> in a > > > > > > >> > > > truly > > > > > > >> > > > > async way, so 99% of people just use the future as > a > > > way to > > > > > > >> batch > > > > > > >> > > > calls. > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > We should definitely figure out how the APIs are going > to > > be > > > > > used > > > > > > >> before > > > > > > >> > > > deciding. I agree that callbacks are definitely painful > > and > > > > > > there's > > > > > > >> > > little > > > > > > >> > > > reason to expose them in a modern API unless it's meant > to > > > be > > > > > very > > > > > > >> low > > > > > > >> > > > level. When it comes to Futures, I think it's important > to > > > > > > >> distinguish > > > > > > >> > > what > > > > > > >> > > > is available in Java 7 and below versus what is > available > > > from > > > > > > Java > > > > > > >> 8 > > > > > > >> > > > onwards. CompletableFuture makes it much easier to > > > compose/chain > > > > > > >> > > operations > > > > > > >> > > > (in a similar vein to java.util.Stream, our own Streams > > API, > > > > > etc.) > > > > > > >> and it > > > > > > >> > > > gives you the ability to register callbacks if you > really > > > want > > > > > to > > > > > > >> > > (avoiding > > > > > > >> > > > the somewhat odd situation in the Producer where we > > return a > > > > > > Future > > > > > > >> _and_ > > > > > > >> > > > allow you to pass a callback). > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > - Personally I don't think splitting the admin > > methods > > > up > > > > > > >> actually > > > > > > >> > > > makes > > > > > > >> > > > > things more usable. It just makes you have to dig > > > through > > > > > our > > > > > > >> > > > > hierarchy. I > > > > > > >> > > > > think a flat class with a bunch of operations (like > > the > > > > > > >> consumer > > > > > > >> > > api) > > > > > > >> > > > is > > > > > > >> > > > > probably the easiest for people to grok and find > > things > > > > > on. I > > > > > > >> am > > > > > > >> > > kind > > > > > > >> > > > > of a > > > > > > >> > > > > dumb PHP programmer at heart, though. > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > I am not sure it's fair to compare the AdminClient with > > the > > > > > > >> Consumer. The > > > > > > >> > > > former has APIs for a bunch of unrelated APIs (topics, > > ACLs, > > > > > > >> configs, > > > > > > >> > > > consumer groups, delegation tokens, preferred leader > > > election, > > > > > > >> partition > > > > > > >> > > > reassignment, etc.) where the latter is pretty > > specialised. > > > For > > > > > > >> each of > > > > > > >> > > the > > > > > > >> > > > resources, you may have 3-4 operations, it will get > > > confusing > > > > > > fast. > > > > > > >> Also, > > > > > > >> > > > do you really think an API that has one level of > grouping > > > will > > > > > > mean > > > > > > >> that > > > > > > >> > > > users have to "dig through our hierarchy"? Or are you > > > concerned > > > > > > >> that once > > > > > > >> > > > we go in that direction, there is a danger of making the > > > > > hierarchy > > > > > > >> more > > > > > > >> > > > complicated? > > > > > > >> > > > > > > > > > >> > > > Finally, I am not sure I would use the consumer as an > > > example of > > > > > > >> > > something > > > > > > >> > > > that is easy to grok. :) The fact that methods behave > > pretty > > > > > > >> differently > > > > > > >> > > > (some are blocking while others only have an effect > after > > > poll) > > > > > > >> with no > > > > > > >> > > > indication from the type signature or naming convention > > > makes it > > > > > > >> harder, > > > > > > >> > > > not easier, to understand. > > > > > > >> > > > > > > > > > >> > > > Ismael > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >