Re: [PROPOSAL] Kafka for the Apache Incubator

Henry Saputra Fri, 24 Jun 2011 11:17:44 -0700

+1

A very good proposal and it seems to help solve our need for low
latency event messaging system, so looking forward to it.


I would love to contribute to the project and have added my name to
list initial committers if no objection.

- Henry

>> 2011/6/22 Jun Rao <jun...@gmail.com>
>>
>> > Hi,
>> >
>> > I would like to propose Kafka to be an Apache Incubator project.  Kafka
>> is
>> > a
>> > distributed, high throughput, publish-subscribe system for processing
>> large
>> > amounts of streaming data.
>> >
>> > Here's a link to the proposal in the Incubator wiki
>> > http://wiki.apache.org/incubator/KafkaProposal
>> >
>> > I've also pasted the initial contents below.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > == Abstract ==
>> > Kafka is a distributed publish-subscribe system for processing large
>> > amounts
>> > of streaming data.
>> >
>> > == Proposal ==
>> > Kafka provides an extremely high throughput distributed publish/subscribe
>> > messaging system.  Additionally, it supports relatively long term
>> > persistence of messages to support a wide variety of consumers,
>> > partitioning
>> > of the message stream across servers and consumers, and functionality for
>> > loading data into Apache Hadoop for offline, batch processing.
>> >
>> > == Background ==
>> > Kafka was developed at LinkedIn to process the large amounts of events
>> > generated by that company's website and provide a common repository for
>> > many
>> > types of consumers to access and process those events. Kafka has been
>> used
>> > in production at LinkedIn scale to handle dozens of types of events
>> > including page views, searches and social network activity. Kafka
>> clusters
>> > at LinkedIn currently process more than two billion events per day.
>> >
>> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
>> > which
>> > can provide high-volume messaging systems but lack persistence of those
>> > messages, and log processing systems such as Scribe and Flume, which do
>> not
>> > provide adequate latency for our diverse set of consumers.  Kafka can
>> also
>> > be inserted into traditional log-processing systems, acting as an
>> > intermediate step before further processing. Kafka focuses relentlessly
>> on
>> > performance and throughput by not introspecting into message content, nor
>> > indexing them on the broker.  We also achieve high performance by
>> depending
>> > on Java's sendFile/transferTo capabilities to minimize intermediate
>> buffer
>> > copies and relying on the OS's pagecache to efficiently serve up message
>> > contents to consumers.
>> >
>> > Kafka is written in Scala and depends on Apache ZooKeeper for
>> coordination
>> > amongst its producers, brokers and consumers.
>> >
>> > Kafka was developed internally at LinkedIn to meet our particular use
>> > cases,
>> > but will be useful to many organizations facing a similar need to
>> reliably
>> > process large amounts of streaming data.  Therefore, we would like to
>> share
>> > it the ASF and begin developing a community of developers and users
>> within
>> > Apache.
>> >
>> > == Rationale ==
>> > Many organizations can benefit from a reliable stream processing system
>> > such
>> > as Kafka.  While our use case of processing events from a very large
>> > website
>> > like LinkedIn has driven the design of Kafka, its uses are varied and we
>> > expect many new use cases to emerge.  Kafka provides a natural bridge
>> > between near real-time event processing and offline batch processing and
>> > will appeal to many users.
>> >
>> > == Current Status ==
>> > === Meritocracy ===
>> > Our intent with this incubator proposal is to start building a diverse
>> > developer community around Kafka following the Apache meritocracy model.
>> > Since Kafka was open sourced we have solicited contributions via the
>> > website
>> > and presentations given to user groups and technical audiences.  We have
>> > had
>> > positive responses to these and have received several contributions and
>> > clients for other languages.  We plan to continue this support for new
>> > contributors and work with those who contribute significantly to the
>> > project
>> > to make them committers.
>> >
>> > === Community ===
>> > Kafka is currently being used by developed by engineers within LinkedIn
>> and
>> > used in production in that company. Additionally, we have active users in
>> > or
>> > have received contributions from a diverse set of companies including
>> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
>> > presentations of Kafka and its goals garnered much interest from
>> potential
>> > contributors. We hope to extend our contributor base significantly and
>> > invite all those who are interested in building high-throughput
>> distributed
>> > systems to participate.  We have begun receiving contributions from
>> outside
>> > of LinkedIn, including clients for several languages including Ruby, PHP,
>> > Clojure, .NET and Python.
>> >
>> > To further this goal, we use GitHub issue tracking and branching
>> > facilities,
>> > as well as maintaining a public mailing list via Google Groups.
>> >
>> > === Core Developers ===
>> > Kafka is currently being developed by four engineers at LinkedIn: Neha
>> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
>> > Apache as a Cassandra committer and PMC member. Neha has been an active
>> > contributor to several projects LinkedIn has open sourced, including
>> Bobo,
>> > Sensei and Zoie. Jay has experience with open source software as the
>> > originator of the Project Voldemort project, as well as being active
>> within
>> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and
>> PMC
>> > and previous Apache ZooKeeper contributor.
>> >
>> > === Alignment ===
>> > The ASF is the natural choice to host the Kafka project as its goal of
>> > encouraging community-driven open-source projects fits with our vision
>> for
>> > Kafka.  Additionally, many other projects with which we are familiar with
>> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
>> > and log4j are hosted by the ASF and we will benefit and provide benefit
>> by
>> > close proximity to them.
>> >
>> > == Known Risks ==
>> > === Orphaned Products ===
>> > The core developers plan to work full time on the project. There is very
>> > little risk of Kafka being abandoned as it is a critical part of
>> LinkedIn's
>> > internal infrastructure and is in production use.
>> >
>> > === Inexperience with Open Source ===
>> > All of the core developers have experience with open source development.
>> >  LinkedIn open sourced Kafka several months ago and has been receiving
>> > contributions since.  Jun is an Apache Cassandra committer and PMC
>> member.
>> >  Jay and Neha have been involved with several open source projects
>> released
>> > by LinkedIn.  Jakob has been actively involved with the ASF as a
>> full-time
>> > Hadoop committer and PMC member.
>> >
>> > === Homogeneous Developers ===
>> > The current core developers are all from LinkedIn. However, we hope to
>> > establish a developer community that includes contributors from several
>> > corporations and we actively encouraging new contributors via the mailing
>> > lists and public presentations of Kafka.
>> >
>> > === Reliance on Salaried Developers ===
>> > Currently, the developers are paid to do work on Kafka. However, once the
>> > project has a community built around it, we expect to get committers,
>> > developers and community from outside the current core developers.
>> However,
>> > because LinkedIn relies on Kafka internally, the reliance on salaried
>> > developers is unlikely to change.
>> >
>> > === Relationships with Other Apache Products ===
>> > Kafka is deeply integrated with Apache products. Kafka uses Apache
>> > ZooKeeper
>> > to coordinate its state amongst the brokers, consumers, and soon, the
>> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
>> load
>> > data directly from Kafka.  Kafka provides an appender to allow consuming
>> > data directly from Apache log4j.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> > While we respect the reputation of the Apache brand and have no doubts
>> that
>> > it will attract contributors and users, our interest is primarily to give
>> > Kafka a solid home as an open source project following an established
>> > development model. We have also given reasons in the Rationale and
>> > Alignment
>> > sections.
>> >
>> > == Documentation ==
>> > Information about Kafka can be found at [http://sna-projects.com/kafka/]
>> > The
>> > following links provide more information about the project:
>> >
>> >  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>> >  * The GitHub site: [https://github.com/kafka-dev/kafka]
>> >  * Kafka overview from Jay Kreps: [
>> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>> >  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>> >  * Kafka paper at NetDB 2011: [
>> >
>> >
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> > ]
>> >
>> > == Initial Source ==
>> > Kafka has been under development at LinkedIn since November 2009.  It was
>> > open sourced by LinkedIn in January 2011.  It is currently hosted on
>> github
>> > under the Apache license at [https://github.com/kafka-dev/kafka]
>> >
>> > Kafka is mainly written in Scala with some performance testing code in
>> > Java.
>> >  Several clients have been contributed in other languages, including
>> Ruby,
>> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
>> contained
>> > and relies of simple build tool (sbt) as its build system and dependency
>> > resolution mechanism.
>> >
>> > == External Dependencies ==
>> > The dependencies all have Apache compatible licenses.
>> >
>> > == Cryptography ==
>> > Not applicable.
>> >
>> > == Required Resources ==
>> > === Mailing Lists ===
>> >  * kafka-private for private PMC discussions (with moderated
>> subscriptions)
>> >  * kafka-dev   * kafka-commits   * kafka-user
>> >
>> > === Subversion Directory ===
>> > [https://svn.apache.org/repos/asf/incubator/kafka]
>> >
>> > === Issue Tracking ===
>> > JIRA Kafka (KAFKA)
>> >
>> > === Other Resources ===
>> > The existing code already has unit tests, so we would like a Hudson
>> > instance
>> > to run them whenever a new patch is submitted. This can be added after
>> > project creation.
>> >
>> > == Initial Committers ==
>> >  * Jay Kreps
>> >  * Jun Rao
>> >  * Neha Narkhede
>> >  * Jakob Homan
>> >
>> > == Affiliations ==
>> >  * Jay Kreps (LinkedIn)
>> >  * Jun Rao (LinkedIn)
>> >  * Neha Narkhede (LinkedIn)
>> >  * Jakob Homan (LinkedIn)
>> >
>> > == Sponsors ==
>> > === Champion ===
>> > Chris Douglas (Apache Member)
>> >
>> > === Nominated Mentors ===
>> >  * Alan Cabrera (Apache Member)
>> >  * Geir Magnusson, Jr. (Apache Member and Director)
>> >  * Owen O'Malley (Apache Member)
>> >
>> > === Sponsoring Entity ===
>> > We are requesting the Incubator to sponsor this project.
>> >
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] Kafka for the Apache Incubator

Reply via email to