+1 A very good proposal and it seems to help solve our need for low latency event messaging system, so looking forward to it.
I would love to contribute to the project and have added my name to list initial committers if no objection. - Henry >> 2011/6/22 Jun Rao <jun...@gmail.com> >> >> > Hi, >> > >> > I would like to propose Kafka to be an Apache Incubator project. Kafka >> is >> > a >> > distributed, high throughput, publish-subscribe system for processing >> large >> > amounts of streaming data. >> > >> > Here's a link to the proposal in the Incubator wiki >> > http://wiki.apache.org/incubator/KafkaProposal >> > >> > I've also pasted the initial contents below. >> > >> > Thanks, >> > >> > Jun >> > >> > == Abstract == >> > Kafka is a distributed publish-subscribe system for processing large >> > amounts >> > of streaming data. >> > >> > == Proposal == >> > Kafka provides an extremely high throughput distributed publish/subscribe >> > messaging system. Additionally, it supports relatively long term >> > persistence of messages to support a wide variety of consumers, >> > partitioning >> > of the message stream across servers and consumers, and functionality for >> > loading data into Apache Hadoop for offline, batch processing. >> > >> > == Background == >> > Kafka was developed at LinkedIn to process the large amounts of events >> > generated by that company's website and provide a common repository for >> > many >> > types of consumers to access and process those events. Kafka has been >> used >> > in production at LinkedIn scale to handle dozens of types of events >> > including page views, searches and social network activity. Kafka >> clusters >> > at LinkedIn currently process more than two billion events per day. >> > >> > Kafka fills the gap between messaging systems such as Apache ActiveMQ, >> > which >> > can provide high-volume messaging systems but lack persistence of those >> > messages, and log processing systems such as Scribe and Flume, which do >> not >> > provide adequate latency for our diverse set of consumers. Kafka can >> also >> > be inserted into traditional log-processing systems, acting as an >> > intermediate step before further processing. Kafka focuses relentlessly >> on >> > performance and throughput by not introspecting into message content, nor >> > indexing them on the broker. We also achieve high performance by >> depending >> > on Java's sendFile/transferTo capabilities to minimize intermediate >> buffer >> > copies and relying on the OS's pagecache to efficiently serve up message >> > contents to consumers. >> > >> > Kafka is written in Scala and depends on Apache ZooKeeper for >> coordination >> > amongst its producers, brokers and consumers. >> > >> > Kafka was developed internally at LinkedIn to meet our particular use >> > cases, >> > but will be useful to many organizations facing a similar need to >> reliably >> > process large amounts of streaming data. Therefore, we would like to >> share >> > it the ASF and begin developing a community of developers and users >> within >> > Apache. >> > >> > == Rationale == >> > Many organizations can benefit from a reliable stream processing system >> > such >> > as Kafka. While our use case of processing events from a very large >> > website >> > like LinkedIn has driven the design of Kafka, its uses are varied and we >> > expect many new use cases to emerge. Kafka provides a natural bridge >> > between near real-time event processing and offline batch processing and >> > will appeal to many users. >> > >> > == Current Status == >> > === Meritocracy === >> > Our intent with this incubator proposal is to start building a diverse >> > developer community around Kafka following the Apache meritocracy model. >> > Since Kafka was open sourced we have solicited contributions via the >> > website >> > and presentations given to user groups and technical audiences. We have >> > had >> > positive responses to these and have received several contributions and >> > clients for other languages. We plan to continue this support for new >> > contributors and work with those who contribute significantly to the >> > project >> > to make them committers. >> > >> > === Community === >> > Kafka is currently being used by developed by engineers within LinkedIn >> and >> > used in production in that company. Additionally, we have active users in >> > or >> > have received contributions from a diverse set of companies including >> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public >> > presentations of Kafka and its goals garnered much interest from >> potential >> > contributors. We hope to extend our contributor base significantly and >> > invite all those who are interested in building high-throughput >> distributed >> > systems to participate. We have begun receiving contributions from >> outside >> > of LinkedIn, including clients for several languages including Ruby, PHP, >> > Clojure, .NET and Python. >> > >> > To further this goal, we use GitHub issue tracking and branching >> > facilities, >> > as well as maintaining a public mailing list via Google Groups. >> > >> > === Core Developers === >> > Kafka is currently being developed by four engineers at LinkedIn: Neha >> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within >> > Apache as a Cassandra committer and PMC member. Neha has been an active >> > contributor to several projects LinkedIn has open sourced, including >> Bobo, >> > Sensei and Zoie. Jay has experience with open source software as the >> > originator of the Project Voldemort project, as well as being active >> within >> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and >> PMC >> > and previous Apache ZooKeeper contributor. >> > >> > === Alignment === >> > The ASF is the natural choice to host the Kafka project as its goal of >> > encouraging community-driven open-source projects fits with our vision >> for >> > Kafka. Additionally, many other projects with which we are familiar with >> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper >> > and log4j are hosted by the ASF and we will benefit and provide benefit >> by >> > close proximity to them. >> > >> > == Known Risks == >> > === Orphaned Products === >> > The core developers plan to work full time on the project. There is very >> > little risk of Kafka being abandoned as it is a critical part of >> LinkedIn's >> > internal infrastructure and is in production use. >> > >> > === Inexperience with Open Source === >> > All of the core developers have experience with open source development. >> > LinkedIn open sourced Kafka several months ago and has been receiving >> > contributions since. Jun is an Apache Cassandra committer and PMC >> member. >> > Jay and Neha have been involved with several open source projects >> released >> > by LinkedIn. Jakob has been actively involved with the ASF as a >> full-time >> > Hadoop committer and PMC member. >> > >> > === Homogeneous Developers === >> > The current core developers are all from LinkedIn. However, we hope to >> > establish a developer community that includes contributors from several >> > corporations and we actively encouraging new contributors via the mailing >> > lists and public presentations of Kafka. >> > >> > === Reliance on Salaried Developers === >> > Currently, the developers are paid to do work on Kafka. However, once the >> > project has a community built around it, we expect to get committers, >> > developers and community from outside the current core developers. >> However, >> > because LinkedIn relies on Kafka internally, the reliance on salaried >> > developers is unlikely to change. >> > >> > === Relationships with Other Apache Products === >> > Kafka is deeply integrated with Apache products. Kafka uses Apache >> > ZooKeeper >> > to coordinate its state amongst the brokers, consumers, and soon, the >> > producers. Kafka provides input formats to allow Hadoop MapReduce to >> load >> > data directly from Kafka. Kafka provides an appender to allow consuming >> > data directly from Apache log4j. >> > >> > === An Excessive Fascination with the Apache Brand === >> > While we respect the reputation of the Apache brand and have no doubts >> that >> > it will attract contributors and users, our interest is primarily to give >> > Kafka a solid home as an open source project following an established >> > development model. We have also given reasons in the Rationale and >> > Alignment >> > sections. >> > >> > == Documentation == >> > Information about Kafka can be found at [http://sna-projects.com/kafka/] >> > The >> > following links provide more information about the project: >> > >> > * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php] >> > * The GitHub site: [https://github.com/kafka-dev/kafka] >> > * Kafka overview from Jay Kreps: [ >> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation] >> > * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz] >> > * Kafka paper at NetDB 2011: [ >> > >> > >> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf >> > ] >> > >> > == Initial Source == >> > Kafka has been under development at LinkedIn since November 2009. It was >> > open sourced by LinkedIn in January 2011. It is currently hosted on >> github >> > under the Apache license at [https://github.com/kafka-dev/kafka] >> > >> > Kafka is mainly written in Scala with some performance testing code in >> > Java. >> > Several clients have been contributed in other languages, including >> Ruby, >> > PHP, Clojure, .NET and Python. Its source tree is entirely self >> contained >> > and relies of simple build tool (sbt) as its build system and dependency >> > resolution mechanism. >> > >> > == External Dependencies == >> > The dependencies all have Apache compatible licenses. >> > >> > == Cryptography == >> > Not applicable. >> > >> > == Required Resources == >> > === Mailing Lists === >> > * kafka-private for private PMC discussions (with moderated >> subscriptions) >> > * kafka-dev * kafka-commits * kafka-user >> > >> > === Subversion Directory === >> > [https://svn.apache.org/repos/asf/incubator/kafka] >> > >> > === Issue Tracking === >> > JIRA Kafka (KAFKA) >> > >> > === Other Resources === >> > The existing code already has unit tests, so we would like a Hudson >> > instance >> > to run them whenever a new patch is submitted. This can be added after >> > project creation. >> > >> > == Initial Committers == >> > * Jay Kreps >> > * Jun Rao >> > * Neha Narkhede >> > * Jakob Homan >> > >> > == Affiliations == >> > * Jay Kreps (LinkedIn) >> > * Jun Rao (LinkedIn) >> > * Neha Narkhede (LinkedIn) >> > * Jakob Homan (LinkedIn) >> > >> > == Sponsors == >> > === Champion === >> > Chris Douglas (Apache Member) >> > >> > === Nominated Mentors === >> > * Alan Cabrera (Apache Member) >> > * Geir Magnusson, Jr. (Apache Member and Director) >> > * Owen O'Malley (Apache Member) >> > >> > === Sponsoring Entity === >> > We are requesting the Incubator to sponsor this project. >> > >> > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org