On Mon, Feb 13, 2017, at 23:04, radai wrote: > 1. making the client Closeable/AutoCloseable would allow try (Client = > ...) > {} without the need to finally close.
Good idea... let's make the interface extend AutoCloseable. > > 2. a "stream processing unit" (producer + consumer) currently holds 2 > open > sockets to every broker it interacts with, because producer and consumer > dont share the network stack. if we use the admin API to auto cleanup on > commit for intermediate pipelines (which is one of our use cases) this > figure goes up to 3 sockets per unit of processing per broker. beyond > becoming a scalability issue this (i think) might also introduce annoying > bugs due to some (but not all) of these connections being down. this is > not > an issue of this KIP though. Right, that is out of scope for this KIP, which is just about public APIs. It's worth thinking about for the future, though. best, Colin > > On Mon, Feb 13, 2017 at 11:51 AM, Colin McCabe <cmcc...@apache.org> > wrote: > > > On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote: > > > Hey Colin, > > > > > > Thanks for the hard work on this. I know going back and forth on APIs is > > > kind of frustrating but we're at the point where these things live long > > > enough and are used by enough people that it is worth the pain. I'm sure > > > it'll come down in the right place eventually. A couple things I've found > > > helped in the past: > > > > > > 1. The burden of evidence needs to fall on the complicator. i.e. if > > > person X thinks the api should be async they need to produce a set of > > > common use cases that require this. Otherwise you are perpetually > > > having to > > > think "we might need x". I think it is good to have a rule of "simple > > > until > > > proven insufficient". > > > 2. Make sure we frame things for the intended audience. At this point > > > our apis get used by a very broad set of Java engineers. This is a > > > very > > > different audience from our developer mailing list. These people code > > > for a > > > living not necessarily as a passion, and may not understand details of > > > the > > > internals of our system or even basic things like multi-threaded > > > programming. I don't think this means we want to dumb things down, but > > > rather try really hard to make things truly simple when possible. > > > > > > Okay here were a couple of comments: > > > > > > 1. Conceptually what is a TopicContext? I think it means something > > > like > > > TopicAdmin? It is not literally context about Topics right? What is > > > the > > > relationship of Contexts to clients? Is there a threadsafety > > > difference? > > > Would be nice to not have to think about this, this is what I mean by > > > "conceptual weight". We introduce a new concept that is a bit nebulous > > > that > > > I have to figure out to use what could be a simple api. I'm sure > > > you've > > > been through this experience before where you have these various > > > objects > > > and you're trying to figure out what they represent (the connection to > > > the > > > server? the information to create a connection? a request session?). > > > > The intention was to provide some grouping of methods, and also a place > > to put request parameters which were often set to defaults rather than > > being explicitly set. If it seems complex, we can certainly get rid of > > it. > > > > > 2. We've tried to avoid the Impl naming convention. In general the > > > rule > > > has been if there is only going to be one implementation you don't > > > need an > > > interface. If there will be multiple, distinguish it from the others. > > > The > > > other clients follow this pattern: Producer, KafkaProducer, > > > MockProducer; > > > Consumer, KafkaConsumer, MockConsumer. > > > > Good point. Let's change the interface to KafkaAdminClient, and the > > implementation to AdminClient. > > > > > 3. We generally don't use setters or getters as a naming convention. I > > > personally think mutating the setting in place seems kind of like late > > > 90s > > > Java style. I think it likely has thread-safety issues. i.e. even if > > > it is > > > volatile you may not get the value you just set if there is another > > > thread... I actually really liked what you described as your original > > > idea > > > of having a single parameter object like CreateTopicRequest that holds > > > all > > > these parameters and defaults. This lets you evolve the api with all > > > the > > > various combinations of arguments without overloading insanity. After > > > doing > > > literally tens of thousands of remote APIs at LinkedIn we eventually > > > converged on a rule, which is ultimately every remote api needs a > > > single > > > argument object you can add to over time and it must be batched. Which > > > brings me to my next point... > > > > Just to clarify, volatiles were never a part of the proposal. I think > > that context objects or request objects should be used by a single > > thread at a time. > > > > I'm not opposed to request objects, but I think they raise all the same > > questions as context objects. Basically, the thread-safety issues need > > to be spelled out and understood by the user, and the user needs more > > lines of code to make a request. And there will be people trying to do > > things like re-use request objects when they should not, and so forth. > > > > > 4. I agree batch apis are annoying but I suspect we'll end up adding > > > one. Doing 1000 requests for 1000 operations if creating or deleting > > > will > > > be bad, right? This won't be the common case, but when you do it it > > > will be > > > a deal-breaker problem. I don't think we should try to fix this one > > > behind > > > the scenes. > > > 5. Are we going to do CompletableFuture (which requires java 8) or > > > normal Future? Normal Future is utterly useless for most things other > > > than > > > just calling wait. If we can evolve in place from Future to > > > CompletableFuture that is fantastic (we could do it for the producer > > > too!). > > > My belief was that this was binary incompatible but I actually don't > > > know > > > (obviously it's source compatible). > > > > In my testing, replacing a return type with a subclass of that return > > type did not break binary compatibility. I haven't been able to find > > chapter and verse on this from the Java implementers, though. > > > > best, > > Colin > > > > > > > > > > -Jay > > > > > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe <cmcc...@apache.org> wrote: > > > > > > > Hi all, > > > > > > > > I made some major revisions to the proposal on the wiki, so please > > check > > > > it out. > > > > > > > > The new API is based on Ismael's suggestion of grouping related APIs. > > > > There is only one layer of grouping. I think that it's actually pretty > > > > intuitive. It's also based on the idea of using Futures, which several > > > > people commented that they'd like to see. > > > > > > > > Here's a simple example: > > > > > > > > > AdminClient client = new AdminClientImpl(myConfig); > > > > > try { > > > > > client.topics().create("foo", 3, (short) 2, false).get(); > > > > > Collection<String> topicNames = client.topics().list(false). > > get(); > > > > > log.info("Found topics: {}", Utils.mkString(topicNames, ", ")); > > > > > Collection<Node> nodes = client.nodes().list().get(); > > > > > log.info("Found cluster nodes: {}", Utils.mkString(nodes, ", ")); > > > > > } finally { > > > > > client.close(); > > > > > } > > > > > > > > The good thing is, there is no Try, no 'get' prefixes, no messing with > > > > batch APIs. If there is an error, then Future#get() throws an > > > > ExecutionException which wraps the relevant exception in the standard > > > > Java way. > > > > > > > > Here's a slightly less simple example: > > > > > > > > > AdminClient client = new AdminClientImpl(myConfig); > > > > > try { > > > > > List<Future<Void>> futures = new LinkedList<>(); > > > > > for (String topicName: myNewTopicNames) { > > > > > creations.add(client.topics(). > > > > > setClientTimeout(30000).setCreationConfig(myTopicConfig). > > > > > create(topicName, 3, (short) 2, false)); > > > > > } > > > > > Futures.waitForAll(futures); > > > > > } finally { > > > > > client.close(); > > > > > } > > > > > > > > I went with Futures because I feel like ought to have some option for > > > > doing async. It's a style of programming that has become a lot more > > > > popular with the rise of Node.js, Twisted python, etc. etc. Also, as > > > > Ismael commented, Java 8 CompletableFuture is going to make Java's > > > > support for fluent async programming a lot stronger by allowing call > > > > chaining and much more. > > > > > > > > If we are going to support async, the simplest thing is just to make > > > > everything return a future and let people call get() if they want to > > run > > > > synchronously. Having a mix of async and sync APIs is just going to be > > > > confusing and redundant. > > > > > > > > I think we should try to avoid creating single functions that start > > > > multiple requests if we can. It makes things much uglier. It means > > > > that you have to have some kind of request class that wraps up the > > > > request the user is trying to create, so that you can handle an array > > of > > > > those requests. The return value has to be something like Map<Node, > > > > Try<Value>> to represent which nodes failed and succeeded. This is the > > > > kind of stuff that, in my opinion, makes people scratch their heads. > > > > > > > > If we need to, we can still get some of the efficiency benefits of > > batch > > > > APIs by waiting for a millisecond or two before sending out a topic > > > > create() request to see if other create() requests arrive. If so, we > > > > can coalesce them. It might be better to figure out if this is an > > > > actual performance issue before implementing it, though. > > > > > > > > I think it would be good to get something out there, annotate it as > > > > @Unstable, and get feedback from people building against trunk and > > using > > > > it. We have removed or changed @Unstable APIs in streams before, so I > > > > don't think we should worry that it will get set in stone prematurely. > > > > The AdminClient API should get much less developer use than anything in > > > > streams, so changing an unstable API should be much easier. > > > > > > > > best, > > > > Colin > > > > > > > > > > > > On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote: > > > > > Thanks for elaborating Jay. I totally agree that we have to be very > > > > > careful > > > > > in how we use our complexity budget. Easier said than done when > > people > > > > > don't agree on what is complex and what is simple. :) For example, I > > > > > think > > > > > batch APIs are a significant source of complexity as you have to do a > > > > > bunch > > > > > of ceremony to group things before sending the request and error > > handling > > > > > becomes more complex due to partial failures (things like `Try` or > > other > > > > > mechanisms that serve a similar role are then needed). > > > > > > > > > > Maybe a way forward is to write API usage examples to help validate > > that > > > > > the suggested API is indeed easy to use. > > > > > > > > > > Ismael > > > > > > > > > > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <j...@confluent.io> wrote: > > > > > > > > > > > Totally agree on CompletableFuture. Also agree with some of the > > rough > > > > edges > > > > > > on the Consumer. > > > > > > > > > > > > I don't have much of a leg to stand on with the splitting vs not > > > > splitting > > > > > > thing, really hard to argue one or the other is better. I guess > > the one > > > > > > observation in watching us try to make good public apis over the > > years > > > > is I > > > > > > am kind of in favor of a particular kind of simple. In particular I > > > > think > > > > > > since the bar is sooo high in support and docs and the community of > > > > users > > > > > > so broad in the range of their capabilities, it makes it so there > > is a > > > > lot > > > > > > of value in dead simple interfaces that don't have a lot of > > conceptual > > > > > > weight, don't introduce a lot of new classes or concepts or general > > > > > > patterns that must be understood to use them correctly. So things > > like > > > > > > nesting, or the Try class, or async apis, or even just a complex > > set of > > > > > > classes representing arguments or return values kind of all stack > > > > > > conceptual burdens on the user to figure out correct usage. So > > like, > > > > for > > > > > > example, the Try class is very elegant and represents a whole > > > > generalized > > > > > > class of possibly completed actions, but the flip side is maybe I'm > > > > just a > > > > > > working guy who needs to list his kafka topics but doesn't know > > Rust, > > > > take > > > > > > pity on me! :-) > > > > > > > > > > > > Nit picking aside, super excited to see us progress on this. > > > > > > > > > > > > -Jay > > > > > > > > > > > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma <ism...@juma.me.uk> > > wrote: > > > > > > > > > > > > > Hi Jay, > > > > > > > > > > > > > > Thanks for the feedback. Comments inline. > > > > > > > > > > > > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps <j...@confluent.io> > > wrote: > > > > > > > > > > > > > > > > - I think it would be good to not use "get" as the prefix > > for > > > > things > > > > > > > > making remote calls. We've tried to avoid the java getter > > > > convention > > > > > > > > entirely (see code style guide), but for remote calls in > > > > particular > > > > > > it > > > > > > > > kind > > > > > > > > of blurs the line between field access and remote RPC in a > > way > > > > that > > > > > > > > leads > > > > > > > > people to trouble. What about, e.g., fetchAllGroups() vs > > > > > > > getAllGroups(). > > > > > > > > > > > > > > > > > > > > > > Agreed that we should avoid the `get` prefix for remote calls. > > There > > > > are > > > > > > a > > > > > > > few possible prefixes for the read operations: list, fetch, > > describe. > > > > > > > > > > > > > > > > > > > > > > - I think futures and callbacks are a bit of a pain to use. > > I'd > > > > > > second > > > > > > > > Becket's comment: let's ensure there a common use case > > > > motivating > > > > > > > these > > > > > > > > that wouldn't be just as easily satisfied with batch > > operations > > > > > > (which > > > > > > > > we > > > > > > > > seem to have at least for some things). In terms of > > flexibility > > > > > > > > Callbacks > > > > > > > > > Futures > Batch Ops but I think in terms of usability it is > > the > > > > > > exact > > > > > > > > opposite so let's make sure we have worked out how the API > > will > > > > be > > > > > > > used > > > > > > > > before deciding. In particular I think java Futures are > > often an > > > > > > > > uncomfortable half-way point since calling get() and > > blocking > > > > the > > > > > > > > thread is > > > > > > > > often not what you want for chaining sequences of > > operations in > > > > a > > > > > > > truly > > > > > > > > async way, so 99% of people just use the future as a way to > > > > batch > > > > > > > calls. > > > > > > > > > > > > > > > > > > > > > > We should definitely figure out how the APIs are going to be used > > > > before > > > > > > > deciding. I agree that callbacks are definitely painful and > > there's > > > > > > little > > > > > > > reason to expose them in a modern API unless it's meant to be > > very > > > > low > > > > > > > level. When it comes to Futures, I think it's important to > > > > distinguish > > > > > > what > > > > > > > is available in Java 7 and below versus what is available from > > Java 8 > > > > > > > onwards. CompletableFuture makes it much easier to compose/chain > > > > > > operations > > > > > > > (in a similar vein to java.util.Stream, our own Streams API, > > etc.) > > > > and it > > > > > > > gives you the ability to register callbacks if you really want to > > > > > > (avoiding > > > > > > > the somewhat odd situation in the Producer where we return a > > Future > > > > _and_ > > > > > > > allow you to pass a callback). > > > > > > > > > > > > > > > > > > > > > > - Personally I don't think splitting the admin methods up > > > > actually > > > > > > > makes > > > > > > > > things more usable. It just makes you have to dig through > > our > > > > > > > > hierarchy. I > > > > > > > > think a flat class with a bunch of operations (like the > > consumer > > > > > > api) > > > > > > > is > > > > > > > > probably the easiest for people to grok and find things on. > > I am > > > > > > kind > > > > > > > > of a > > > > > > > > dumb PHP programmer at heart, though. > > > > > > > > > > > > > > > > > > > > > > I am not sure it's fair to compare the AdminClient with the > > > > Consumer. The > > > > > > > former has APIs for a bunch of unrelated APIs (topics, ACLs, > > configs, > > > > > > > consumer groups, delegation tokens, preferred leader election, > > > > partition > > > > > > > reassignment, etc.) where the latter is pretty specialised. For > > each > > > > of > > > > > > the > > > > > > > resources, you may have 3-4 operations, it will get confusing > > fast. > > > > Also, > > > > > > > do you really think an API that has one level of grouping will > > mean > > > > that > > > > > > > users have to "dig through our hierarchy"? Or are you concerned > > that > > > > once > > > > > > > we go in that direction, there is a danger of making the > > hierarchy > > > > more > > > > > > > complicated? > > > > > > > > > > > > > > Finally, I am not sure I would use the consumer as an example of > > > > > > something > > > > > > > that is easy to grok. :) The fact that methods behave pretty > > > > differently > > > > > > > (some are blocking while others only have an effect after poll) > > with > > > > no > > > > > > > indication from the type signature or naming convention makes it > > > > harder, > > > > > > > not easier, to understand. > > > > > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > >