Thanks Jay -- some good ideas there. I agree strongly that fewer, more solid, non-Java clients are better than many shallow ones. Interesting that you feel we could do some more work in this area, as I thought it was well served (even if they have proliferated).
One area I would like see documented better -- and I am considering it myself -- is a collection of Kafka "Architectural Design Patterns", all in one one place. For example, how to use Kafka to build a staging and test environment (tapping the production flow in a non-destructive manner), how to build robust pipelines, to read to and from, say, Apache Storm, how to deploy a cluster in EC2 (the interaction with Availability Zones), topic vs. partition demuxing, etc, etc. I've yet to see a nice consolidation of this information -- it would not really be about coding, but system design. Ideally it would be reviewed by you committers, but someone else would do the work. Philip --------------------------- www.philipotoole.com On Friday, July 18, 2014 3:58 PM, Jay Kreps <jay.kr...@gmail.com> wrote: Basically my thought with getting a separate mailing list was to have a place specifically to discuss issues around clients. I don't see a lot of discussion about them on the main list. I thought perhaps this was because people don't like to ask questions which are about adjacent projects/code bases. But basically whatever will lead to a robust discussion, bug tracking, etc on clients. -Jay On Fri, Jul 18, 2014 at 3:49 PM, Jun Rao <jun...@gmail.com> wrote: > Another important part of eco-system could be around the adaptors of > getting data from other systems into Kafka and vice versa. So, for the > ingestion part, this can include things like getting data from mysql, > syslog, apache server log, etc. For the egress part, this can include > putting Kafka data into HDFS, S3, etc. > > Will a separate mailing list be convenient? Could we just use the Kafka > mailing list? > > Thanks, > > Jun > > > On Fri, Jul 18, 2014 at 2:34 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> A question was asked in another thread about what was an effective way >> to contribute to the Kafka project for people who weren't very >> enthusiastic about writing Java/Scala code. >> >> I wanted to kind of advocate for an area I think is really important >> and not as good as it could be--the client ecosystem. I think our goal >> is to make Kafka effective as a general purpose, centralized, data >> subscription system. This vision only really works if all your >> applications, are able to integrate easily, whatever language they are >> in. >> >> We have a number of pretty good non-java producers. We have been >> lacking the features on the server-side to make writing non-java >> consumers easy. We are fixing that right now as part of the consumer >> work going on right now (which moves a lot of the functionality in the >> java consumer to the server side). >> >> But apart from this I think there may be a lot more we can do to make >> the client ecosystem better. >> >> Here are some concrete ideas. If anyone has additional ideas please >> reply to this thread and share them. If you are interested in picking >> any of these up, please do. >> >> 1. The most obvious way to improve the ecosystem is to help work on >> clients. This doesn't necessarily mean writing new clients, since in >> many cases we already have a client in a given language. I think any >> way we can incentivize fewer, better clients rather than many >> half-working clients we should do. However we are working now on the >> server-side consumer co-ordination so it should now be possible to >> write much simpler consumers. >> >> 2. It would be great if someone put together a mailing list just for >> client developers to share tips, tricks, problems, and so on. We can >> make sure all the main contributors on this too. I think this could be >> a forum for kind of directing improvements in this area. >> >> 3. Help improve the documentation on how to implement a client. We >> have tried to make the protocol spec not just a dry document but also >> have it share best practices, rationale, and intentions. I think this >> could potentially be even better as there is really a range of options >> from a very simple quick implementation to a more complex highly >> optimized version. It would be good to really document some of the >> options and tradeoffs. >> >> 4. Come up with a standard way of documenting the features of clients. >> In an ideal world it would be possible to get the same information >> (author, language, feature set, download link, source code, etc) for >> all clients. It would be great to standardize the documentation for >> the client as well. For example having one or two basic examples that >> are repeated for every client in a standardized way. This would let >> someone come to the Kafka site who is not a java developer, and click >> on the link for their language and view examples of interacting with >> Kafka in the language they know using the client they would eventually >> use. >> >> 5. Build a Kafka Client Compatibility Kit (KCCK) :-) The idea is this: >> anyone who wants to implement a client would implement a simple >> command line program with a set of standardized options. The >> compatibility kit would be a standard set of scripts that ran their >> client using this command line driver and validate its behavior. E.g. >> for a producer it would test that it correctly can send messages, that >> the ordering is retained, that the client correctly handles >> reconnection and metadata refresh, and compression. The output would >> be a list of features that passed are certified, and perhaps basic >> performance information. This would be an easy way to help client >> developers write correct clients, as well as having a standardized >> comparison for the clients that says that they work correctly. >> >> -Jay >>