Hey Philip, Yeah I think we have actually done pretty good at getting reasonably solid clients in a bunch of languages. I just think it is an important area.
The architecture design patterns idea is fantastic. That would be a great thing to do. -Jay On Fri, Jul 18, 2014 at 11:46 PM, Philip O'Toole <philip_o_to...@yahoo.com.invalid> wrote: > Thanks Jay -- some good ideas there. > > I agree strongly that fewer, more solid, non-Java clients are better than > many shallow ones. Interesting that you feel we could do some more work in > this area, as I thought it was well served (even if they have proliferated). > > One area I would like see documented better -- and I am considering it myself > -- is a collection of Kafka "Architectural Design Patterns", all in one one > place. For example, how to use Kafka to build a staging and test environment > (tapping the production flow in a non-destructive manner), how to build > robust pipelines, to read to and from, say, Apache Storm, how to deploy a > cluster in EC2 (the interaction with Availability Zones), topic vs. partition > demuxing, etc, etc. I've yet to see a nice consolidation of this information > -- it would not really be about coding, but system design. Ideally it would > be reviewed by you committers, but someone else would do the work. > > Philip > > > --------------------------- > www.philipotoole.com > > > > On Friday, July 18, 2014 3:58 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > > Basically my thought with getting a separate mailing list was to have > a place specifically to discuss issues around clients. I don't see a > lot of discussion about them on the main list. I thought perhaps this > was because people don't like to ask questions which are about > adjacent projects/code bases. But basically whatever will lead to a > robust discussion, bug tracking, etc on clients. > > -Jay > > > On Fri, Jul 18, 2014 at 3:49 PM, Jun Rao <jun...@gmail.com> wrote: >> Another important part of eco-system could be around the adaptors of >> getting data from other systems into Kafka and vice versa. So, for the >> ingestion part, this can include things like getting data from mysql, >> syslog, apache server log, etc. For the egress part, this can include >> putting Kafka data into HDFS, S3, etc. >> >> Will a separate mailing list be convenient? Could we just use the Kafka >> mailing list? >> >> Thanks, >> >> Jun >> >> >> On Fri, Jul 18, 2014 at 2:34 PM, Jay Kreps <jay.kr...@gmail.com> wrote: >> >>> A question was asked in another thread about what was an effective way >>> to contribute to the Kafka project for people who weren't very >>> enthusiastic about writing Java/Scala code. >>> >>> I wanted to kind of advocate for an area I think is really important >>> and not as good as it could be--the client ecosystem. I think our goal >>> is to make Kafka effective as a general purpose, centralized, data >>> subscription system. This vision only really works if all your >>> applications, are able to integrate easily, whatever language they are >>> in. >>> >>> We have a number of pretty good non-java producers. We have been >>> lacking the features on the server-side to make writing non-java >>> consumers easy. We are fixing that right now as part of the consumer >>> work going on right now (which moves a lot of the functionality in the >>> java consumer to the server side). >>> >>> But apart from this I think there may be a lot more we can do to make >>> the client ecosystem better. >>> >>> Here are some concrete ideas. If anyone has additional ideas please >>> reply to this thread and share them. If you are interested in picking >>> any of these up, please do. >>> >>> 1. The most obvious way to improve the ecosystem is to help work on >>> clients. This doesn't necessarily mean writing new clients, since in >>> many cases we already have a client in a given language. I think any >>> way we can incentivize fewer, better clients rather than many >>> half-working clients we should do. However we are working now on the >>> server-side consumer co-ordination so it should now be possible to >>> write much simpler consumers. >>> >>> 2. It would be great if someone put together a mailing list just for >>> client developers to share tips, tricks, problems, and so on. We can >>> make sure all the main contributors on this too. I think this could be >>> a forum for kind of directing improvements in this area. >>> >>> 3. Help improve the documentation on how to implement a client. We >>> have tried to make the protocol spec not just a dry document but also >>> have it share best practices, rationale, and intentions. I think this >>> could potentially be even better as there is really a range of options >>> from a very simple quick implementation to a more complex highly >>> optimized version. It would be good to really document some of the >>> options and tradeoffs. >>> >>> 4. Come up with a standard way of documenting the features of clients. >>> In an ideal world it would be possible to get the same information >>> (author, language, feature set, download link, source code, etc) for >>> all clients. It would be great to standardize the documentation for >>> the client as well. For example having one or two basic examples that >>> are repeated for every client in a standardized way. This would let >>> someone come to the Kafka site who is not a java developer, and click >>> on the link for their language and view examples of interacting with >>> Kafka in the language they know using the client they would eventually >>> use. >>> >>> 5. Build a Kafka Client Compatibility Kit (KCCK) :-) The idea is this: >>> anyone who wants to implement a client would implement a simple >>> command line program with a set of standardized options. The >>> compatibility kit would be a standard set of scripts that ran their >>> client using this command line driver and validate its behavior. E.g. >>> for a producer it would test that it correctly can send messages, that >>> the ordering is retained, that the client correctly handles >>> reconnection and metadata refresh, and compression. The output would >>> be a list of features that passed are certified, and perhaps basic >>> performance information. This would be an easy way to help client >>> developers write correct clients, as well as having a standardized >>> comparison for the clients that says that they work correctly. >>> >>> -Jay >>>