On Wed, Feb 8, 2017, at 19:02, Dong Lin wrote: > I am not aware of any semantics that will be caused by sharing > NetworkClient between producer/consumer and AdminClient. But I agree that > there is currently no good way to share such an internal class between > them. And yes, goal is to reduce number of connections. For example, say > we > want to enable auto data purge based on committed offset using > AdminClient.purgeDataBefore(...) in a stream processing application, then > in addition to producer or consumer, we will now have AdminClient in > every > job. It means that the the number of connection between server and client > will double. > > I have another comment on the KIP. Is AdminClient API supposed to be > thread > safe?
Yes. The wiki page states: "The client will be multi-threaded; multiple threads will be able to safely make calls using the same AdminClient object." > If so, should we mark private variables such as clientTimeoutMs to > be @volatile? Would it be a concern if two threads call > TopicsContext.setServerTimeout(...) concurrently to use different timeout > for their own use-case? The context objects are extremely lightweight and are not intended to be shared between multiple threads. So each thread would just do client.topics().setClientTimeout(...).create(), and operate on its own TopicsContext. I will add that to the wiki. best, Colin > > Thanks, > Dong > > On Wed, Feb 8, 2017 at 6:50 PM, Jason Gustafson <ja...@confluent.io> > wrote: > > > I'm not too sure sharing NetworkClient is a good idea. The consumer and the > > producer both have request semantics which would be more difficult to > > reason about if the connections are shared with another client. Also, the > > NetworkClient is an internal class which is not really meant for users. Do > > we really want to open that up? Is the only benefit saving the number of > > connections? Seems not worth it in my opinion. > > > > -Jason > > > > On Wed, Feb 8, 2017 at 6:43 PM, Dong Lin <lindon...@gmail.com> wrote: > > > > > BTW, the idea to share NetworkClient is suggested by Radai and I like > > this > > > idea. > > > > > > On Wed, Feb 8, 2017 at 6:39 PM, Dong Lin <lindon...@gmail.com> wrote: > > > > > > > Hey Colin, > > > > > > > > Thanks for updating the KIP. I have two followup questions: > > > > > > > > - It seems that setCreationConfig(...) is a bit redundant given that > > most > > > > arguments (e.g. topic name, partition num) are already passed to > > > > TopicsContext.create(...) when user creates topic. Should we pass > > > > the creationConfig as a parameter to TopicsContext.create(..)? > > > > > > > > - I am wondering if we should also specify the constructor of the > > > > AdminClient in the KIP. Previously we agreed that AdminClient should > > have > > > > its own thread to poll NetworkClient to send/receive messages. Should > > we > > > > also allow AdminClient to use an existing NetworkClient that is > > provided > > > to > > > > the constructor? This would allow AdminClient to share NetworkClient > > with > > > > producer or consumer in order to reduce the total number of open > > sockets > > > on > > > > both client and server. > > > > > > > > Thanks, > > > > Dong > > > > > > > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe <cmcc...@apache.org> > > wrote: > > > > > > > >> Hi all, > > > >> > > > >> I made some major revisions to the proposal on the wiki, so please > > check > > > >> it out. > > > >> > > > >> The new API is based on Ismael's suggestion of grouping related APIs. > > > >> There is only one layer of grouping. I think that it's actually > > pretty > > > >> intuitive. It's also based on the idea of using Futures, which > > several > > > >> people commented that they'd like to see. > > > >> > > > >> Here's a simple example: > > > >> > > > >> > AdminClient client = new AdminClientImpl(myConfig); > > > >> > try { > > > >> > client.topics().create("foo", 3, (short) 2, false).get(); > > > >> > Collection<String> topicNames = client.topics().list(false). > > get(); > > > >> > log.info("Found topics: {}", Utils.mkString(topicNames, ", ")); > > > >> > Collection<Node> nodes = client.nodes().list().get(); > > > >> > log.info("Found cluster nodes: {}", Utils.mkString(nodes, ", > > ")); > > > >> > } finally { > > > >> > client.close(); > > > >> > } > > > >> > > > >> The good thing is, there is no Try, no 'get' prefixes, no messing with > > > >> batch APIs. If there is an error, then Future#get() throws an > > > >> ExecutionException which wraps the relevant exception in the standard > > > >> Java way. > > > >> > > > >> Here's a slightly less simple example: > > > >> > > > >> > AdminClient client = new AdminClientImpl(myConfig); > > > >> > try { > > > >> > List<Future<Void>> futures = new LinkedList<>(); > > > >> > for (String topicName: myNewTopicNames) { > > > >> > creations.add(client.topics(). > > > >> > setClientTimeout(30000).setCreationConfig(myTopicConfig). > > > >> > create(topicName, 3, (short) 2, false)); > > > >> > } > > > >> > Futures.waitForAll(futures); > > > >> > } finally { > > > >> > client.close(); > > > >> > } > > > >> > > > >> I went with Futures because I feel like ought to have some option for > > > >> doing async. It's a style of programming that has become a lot more > > > >> popular with the rise of Node.js, Twisted python, etc. etc. Also, as > > > >> Ismael commented, Java 8 CompletableFuture is going to make Java's > > > >> support for fluent async programming a lot stronger by allowing call > > > >> chaining and much more. > > > >> > > > >> If we are going to support async, the simplest thing is just to make > > > >> everything return a future and let people call get() if they want to > > run > > > >> synchronously. Having a mix of async and sync APIs is just going to > > be > > > >> confusing and redundant. > > > >> > > > >> I think we should try to avoid creating single functions that start > > > >> multiple requests if we can. It makes things much uglier. It means > > > >> that you have to have some kind of request class that wraps up the > > > >> request the user is trying to create, so that you can handle an array > > of > > > >> those requests. The return value has to be something like Map<Node, > > > >> Try<Value>> to represent which nodes failed and succeeded. This is > > the > > > >> kind of stuff that, in my opinion, makes people scratch their heads. > > > >> > > > >> If we need to, we can still get some of the efficiency benefits of > > batch > > > >> APIs by waiting for a millisecond or two before sending out a topic > > > >> create() request to see if other create() requests arrive. If so, we > > > >> can coalesce them. It might be better to figure out if this is an > > > >> actual performance issue before implementing it, though. > > > >> > > > >> I think it would be good to get something out there, annotate it as > > > >> @Unstable, and get feedback from people building against trunk and > > using > > > >> it. We have removed or changed @Unstable APIs in streams before, so I > > > >> don't think we should worry that it will get set in stone prematurely. > > > >> The AdminClient API should get much less developer use than anything > > in > > > >> streams, so changing an unstable API should be much easier. > > > >> > > > >> best, > > > >> Colin > > > >> > > > >> > > > >> On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote: > > > >> > Thanks for elaborating Jay. I totally agree that we have to be very > > > >> > careful > > > >> > in how we use our complexity budget. Easier said than done when > > people > > > >> > don't agree on what is complex and what is simple. :) For example, I > > > >> > think > > > >> > batch APIs are a significant source of complexity as you have to do > > a > > > >> > bunch > > > >> > of ceremony to group things before sending the request and error > > > >> handling > > > >> > becomes more complex due to partial failures (things like `Try` or > > > other > > > >> > mechanisms that serve a similar role are then needed). > > > >> > > > > >> > Maybe a way forward is to write API usage examples to help validate > > > that > > > >> > the suggested API is indeed easy to use. > > > >> > > > > >> > Ismael > > > >> > > > > >> > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <j...@confluent.io> wrote: > > > >> > > > > >> > > Totally agree on CompletableFuture. Also agree with some of the > > > rough > > > >> edges > > > >> > > on the Consumer. > > > >> > > > > > >> > > I don't have much of a leg to stand on with the splitting vs not > > > >> splitting > > > >> > > thing, really hard to argue one or the other is better. I guess > > the > > > >> one > > > >> > > observation in watching us try to make good public apis over the > > > >> years is I > > > >> > > am kind of in favor of a particular kind of simple. In particular > > I > > > >> think > > > >> > > since the bar is sooo high in support and docs and the community > > of > > > >> users > > > >> > > so broad in the range of their capabilities, it makes it so there > > is > > > >> a lot > > > >> > > of value in dead simple interfaces that don't have a lot of > > > conceptual > > > >> > > weight, don't introduce a lot of new classes or concepts or > > general > > > >> > > patterns that must be understood to use them correctly. So things > > > like > > > >> > > nesting, or the Try class, or async apis, or even just a complex > > set > > > >> of > > > >> > > classes representing arguments or return values kind of all stack > > > >> > > conceptual burdens on the user to figure out correct usage. So > > like, > > > >> for > > > >> > > example, the Try class is very elegant and represents a whole > > > >> generalized > > > >> > > class of possibly completed actions, but the flip side is maybe > > I'm > > > >> just a > > > >> > > working guy who needs to list his kafka topics but doesn't know > > > Rust, > > > >> take > > > >> > > pity on me! :-) > > > >> > > > > > >> > > Nit picking aside, super excited to see us progress on this. > > > >> > > > > > >> > > -Jay > > > >> > > > > > >> > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma <ism...@juma.me.uk> > > > wrote: > > > >> > > > > > >> > > > Hi Jay, > > > >> > > > > > > >> > > > Thanks for the feedback. Comments inline. > > > >> > > > > > > >> > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps <j...@confluent.io> > > > wrote: > > > >> > > > > > > > >> > > > > - I think it would be good to not use "get" as the prefix > > for > > > >> things > > > >> > > > > making remote calls. We've tried to avoid the java getter > > > >> convention > > > >> > > > > entirely (see code style guide), but for remote calls in > > > >> particular > > > >> > > it > > > >> > > > > kind > > > >> > > > > of blurs the line between field access and remote RPC in a > > > way > > > >> that > > > >> > > > > leads > > > >> > > > > people to trouble. What about, e.g., fetchAllGroups() vs > > > >> > > > getAllGroups(). > > > >> > > > > > > > >> > > > > > > >> > > > Agreed that we should avoid the `get` prefix for remote calls. > > > >> There are > > > >> > > a > > > >> > > > few possible prefixes for the read operations: list, fetch, > > > >> describe. > > > >> > > > > > > >> > > > > > > >> > > > > - I think futures and callbacks are a bit of a pain to use. > > > I'd > > > >> > > second > > > >> > > > > Becket's comment: let's ensure there a common use case > > > >> motivating > > > >> > > > these > > > >> > > > > that wouldn't be just as easily satisfied with batch > > > operations > > > >> > > (which > > > >> > > > > we > > > >> > > > > seem to have at least for some things). In terms of > > > flexibility > > > >> > > > > Callbacks > > > > >> > > > > Futures > Batch Ops but I think in terms of usability it is > > > the > > > >> > > exact > > > >> > > > > opposite so let's make sure we have worked out how the API > > > >> will be > > > >> > > > used > > > >> > > > > before deciding. In particular I think java Futures are > > often > > > >> an > > > >> > > > > uncomfortable half-way point since calling get() and > > blocking > > > >> the > > > >> > > > > thread is > > > >> > > > > often not what you want for chaining sequences of > > operations > > > >> in a > > > >> > > > truly > > > >> > > > > async way, so 99% of people just use the future as a way to > > > >> batch > > > >> > > > calls. > > > >> > > > > > > > >> > > > > > > >> > > > We should definitely figure out how the APIs are going to be > > used > > > >> before > > > >> > > > deciding. I agree that callbacks are definitely painful and > > > there's > > > >> > > little > > > >> > > > reason to expose them in a modern API unless it's meant to be > > very > > > >> low > > > >> > > > level. When it comes to Futures, I think it's important to > > > >> distinguish > > > >> > > what > > > >> > > > is available in Java 7 and below versus what is available from > > > Java > > > >> 8 > > > >> > > > onwards. CompletableFuture makes it much easier to compose/chain > > > >> > > operations > > > >> > > > (in a similar vein to java.util.Stream, our own Streams API, > > etc.) > > > >> and it > > > >> > > > gives you the ability to register callbacks if you really want > > to > > > >> > > (avoiding > > > >> > > > the somewhat odd situation in the Producer where we return a > > > Future > > > >> _and_ > > > >> > > > allow you to pass a callback). > > > >> > > > > > > >> > > > > > > >> > > > > - Personally I don't think splitting the admin methods up > > > >> actually > > > >> > > > makes > > > >> > > > > things more usable. It just makes you have to dig through > > our > > > >> > > > > hierarchy. I > > > >> > > > > think a flat class with a bunch of operations (like the > > > >> consumer > > > >> > > api) > > > >> > > > is > > > >> > > > > probably the easiest for people to grok and find things > > on. I > > > >> am > > > >> > > kind > > > >> > > > > of a > > > >> > > > > dumb PHP programmer at heart, though. > > > >> > > > > > > > >> > > > > > > >> > > > I am not sure it's fair to compare the AdminClient with the > > > >> Consumer. The > > > >> > > > former has APIs for a bunch of unrelated APIs (topics, ACLs, > > > >> configs, > > > >> > > > consumer groups, delegation tokens, preferred leader election, > > > >> partition > > > >> > > > reassignment, etc.) where the latter is pretty specialised. For > > > >> each of > > > >> > > the > > > >> > > > resources, you may have 3-4 operations, it will get confusing > > > fast. > > > >> Also, > > > >> > > > do you really think an API that has one level of grouping will > > > mean > > > >> that > > > >> > > > users have to "dig through our hierarchy"? Or are you concerned > > > >> that once > > > >> > > > we go in that direction, there is a danger of making the > > hierarchy > > > >> more > > > >> > > > complicated? > > > >> > > > > > > >> > > > Finally, I am not sure I would use the consumer as an example of > > > >> > > something > > > >> > > > that is easy to grok. :) The fact that methods behave pretty > > > >> differently > > > >> > > > (some are blocking while others only have an effect after poll) > > > >> with no > > > >> > > > indication from the type signature or naming convention makes it > > > >> harder, > > > >> > > > not easier, to understand. > > > >> > > > > > > >> > > > Ismael > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > >