Re: [PROPOSAL] Kafka for the Apache Incubator

Jeffrey Damick Fri, 24 Jun 2011 16:43:08 -0700

+1  from us also - at neustar are working on a large deployment using kafka
as well.  We'd be interested to help in the future.


-jeff


On Fri, Jun 24, 2011 at 2:17 PM, Henry Saputra <henry.sapu...@gmail.com>wrote:

> +1
>
> A very good proposal and it seems to help solve our need for low
> latency event messaging system, so looking forward to it.
>
> I would love to contribute to the project and have added my name to
> list initial committers if no objection.
>
> - Henry
>
> >> 2011/6/22 Jun Rao <jun...@gmail.com>
> >>
> >> > Hi,
> >> >
> >> > I would like to propose Kafka to be an Apache Incubator project.
>  Kafka
> >> is
> >> > a
> >> > distributed, high throughput, publish-subscribe system for processing
> >> large
> >> > amounts of streaming data.
> >> >
> >> > Here's a link to the proposal in the Incubator wiki
> >> > http://wiki.apache.org/incubator/KafkaProposal
> >> >
> >> > I've also pasted the initial contents below.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > == Abstract ==
> >> > Kafka is a distributed publish-subscribe system for processing large
> >> > amounts
> >> > of streaming data.
> >> >
> >> > == Proposal ==
> >> > Kafka provides an extremely high throughput distributed
> publish/subscribe
> >> > messaging system.  Additionally, it supports relatively long term
> >> > persistence of messages to support a wide variety of consumers,
> >> > partitioning
> >> > of the message stream across servers and consumers, and functionality
> for
> >> > loading data into Apache Hadoop for offline, batch processing.
> >> >
> >> > == Background ==
> >> > Kafka was developed at LinkedIn to process the large amounts of events
> >> > generated by that company's website and provide a common repository
> for
> >> > many
> >> > types of consumers to access and process those events. Kafka has been
> >> used
> >> > in production at LinkedIn scale to handle dozens of types of events
> >> > including page views, searches and social network activity. Kafka
> >> clusters
> >> > at LinkedIn currently process more than two billion events per day.
> >> >
> >> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> >> > which
> >> > can provide high-volume messaging systems but lack persistence of
> those
> >> > messages, and log processing systems such as Scribe and Flume, which
> do
> >> not
> >> > provide adequate latency for our diverse set of consumers.  Kafka can
> >> also
> >> > be inserted into traditional log-processing systems, acting as an
> >> > intermediate step before further processing. Kafka focuses
> relentlessly
> >> on
> >> > performance and throughput by not introspecting into message content,
> nor
> >> > indexing them on the broker.  We also achieve high performance by
> >> depending
> >> > on Java's sendFile/transferTo capabilities to minimize intermediate
> >> buffer
> >> > copies and relying on the OS's pagecache to efficiently serve up
> message
> >> > contents to consumers.
> >> >
> >> > Kafka is written in Scala and depends on Apache ZooKeeper for
> >> coordination
> >> > amongst its producers, brokers and consumers.
> >> >
> >> > Kafka was developed internally at LinkedIn to meet our particular use
> >> > cases,
> >> > but will be useful to many organizations facing a similar need to
> >> reliably
> >> > process large amounts of streaming data.  Therefore, we would like to
> >> share
> >> > it the ASF and begin developing a community of developers and users
> >> within
> >> > Apache.
> >> >
> >> > == Rationale ==
> >> > Many organizations can benefit from a reliable stream processing
> system
> >> > such
> >> > as Kafka.  While our use case of processing events from a very large
> >> > website
> >> > like LinkedIn has driven the design of Kafka, its uses are varied and
> we
> >> > expect many new use cases to emerge.  Kafka provides a natural bridge
> >> > between near real-time event processing and offline batch processing
> and
> >> > will appeal to many users.
> >> >
> >> > == Current Status ==
> >> > === Meritocracy ===
> >> > Our intent with this incubator proposal is to start building a diverse
> >> > developer community around Kafka following the Apache meritocracy
> model.
> >> > Since Kafka was open sourced we have solicited contributions via the
> >> > website
> >> > and presentations given to user groups and technical audiences.  We
> have
> >> > had
> >> > positive responses to these and have received several contributions
> and
> >> > clients for other languages.  We plan to continue this support for new
> >> > contributors and work with those who contribute significantly to the
> >> > project
> >> > to make them committers.
> >> >
> >> > === Community ===
> >> > Kafka is currently being used by developed by engineers within
> LinkedIn
> >> and
> >> > used in production in that company. Additionally, we have active users
> in
> >> > or
> >> > have received contributions from a diverse set of companies including
> >> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> >> > presentations of Kafka and its goals garnered much interest from
> >> potential
> >> > contributors. We hope to extend our contributor base significantly and
> >> > invite all those who are interested in building high-throughput
> >> distributed
> >> > systems to participate.  We have begun receiving contributions from
> >> outside
> >> > of LinkedIn, including clients for several languages including Ruby,
> PHP,
> >> > Clojure, .NET and Python.
> >> >
> >> > To further this goal, we use GitHub issue tracking and branching
> >> > facilities,
> >> > as well as maintaining a public mailing list via Google Groups.
> >> >
> >> > === Core Developers ===
> >> > Kafka is currently being developed by four engineers at LinkedIn: Neha
> >> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience
> within
> >> > Apache as a Cassandra committer and PMC member. Neha has been an
> active
> >> > contributor to several projects LinkedIn has open sourced, including
> >> Bobo,
> >> > Sensei and Zoie. Jay has experience with open source software as the
> >> > originator of the Project Voldemort project, as well as being active
> >> within
> >> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer
> and
> >> PMC
> >> > and previous Apache ZooKeeper contributor.
> >> >
> >> > === Alignment ===
> >> > The ASF is the natural choice to host the Kafka project as its goal of
> >> > encouraging community-driven open-source projects fits with our vision
> >> for
> >> > Kafka.  Additionally, many other projects with which we are familiar
> with
> >> > and expect Kafka to integrate with, such as Apache Hadoop, Pig,
> ZooKeeper
> >> > and log4j are hosted by the ASF and we will benefit and provide
> benefit
> >> by
> >> > close proximity to them.
> >> >
> >> > == Known Risks ==
> >> > === Orphaned Products ===
> >> > The core developers plan to work full time on the project. There is
> very
> >> > little risk of Kafka being abandoned as it is a critical part of
> >> LinkedIn's
> >> > internal infrastructure and is in production use.
> >> >
> >> > === Inexperience with Open Source ===
> >> > All of the core developers have experience with open source
> development.
> >> >  LinkedIn open sourced Kafka several months ago and has been receiving
> >> > contributions since.  Jun is an Apache Cassandra committer and PMC
> >> member.
> >> >  Jay and Neha have been involved with several open source projects
> >> released
> >> > by LinkedIn.  Jakob has been actively involved with the ASF as a
> >> full-time
> >> > Hadoop committer and PMC member.
> >> >
> >> > === Homogeneous Developers ===
> >> > The current core developers are all from LinkedIn. However, we hope to
> >> > establish a developer community that includes contributors from
> several
> >> > corporations and we actively encouraging new contributors via the
> mailing
> >> > lists and public presentations of Kafka.
> >> >
> >> > === Reliance on Salaried Developers ===
> >> > Currently, the developers are paid to do work on Kafka. However, once
> the
> >> > project has a community built around it, we expect to get committers,
> >> > developers and community from outside the current core developers.
> >> However,
> >> > because LinkedIn relies on Kafka internally, the reliance on salaried
> >> > developers is unlikely to change.
> >> >
> >> > === Relationships with Other Apache Products ===
> >> > Kafka is deeply integrated with Apache products. Kafka uses Apache
> >> > ZooKeeper
> >> > to coordinate its state amongst the brokers, consumers, and soon, the
> >> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
> >> load
> >> > data directly from Kafka.  Kafka provides an appender to allow
> consuming
> >> > data directly from Apache log4j.
> >> >
> >> > === An Excessive Fascination with the Apache Brand ===
> >> > While we respect the reputation of the Apache brand and have no doubts
> >> that
> >> > it will attract contributors and users, our interest is primarily to
> give
> >> > Kafka a solid home as an open source project following an established
> >> > development model. We have also given reasons in the Rationale and
> >> > Alignment
> >> > sections.
> >> >
> >> > == Documentation ==
> >> > Information about Kafka can be found at [
> http://sna-projects.com/kafka/]
> >> > The
> >> > following links provide more information about the project:
> >> >
> >> >  * Kafka roadmap and goals: [
> http://sna-projects.com/kafka/projects.php]
> >> >  * The GitHub site: [https://github.com/kafka-dev/kafka]
> >> >  * Kafka overview from Jay Kreps: [
> >> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> >> >  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> >> >  * Kafka paper at NetDB 2011: [
> >> >
> >> >
> >>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> >> > ]
> >> >
> >> > == Initial Source ==
> >> > Kafka has been under development at LinkedIn since November 2009.  It
> was
> >> > open sourced by LinkedIn in January 2011.  It is currently hosted on
> >> github
> >> > under the Apache license at [https://github.com/kafka-dev/kafka]
> >> >
> >> > Kafka is mainly written in Scala with some performance testing code in
> >> > Java.
> >> >  Several clients have been contributed in other languages, including
> >> Ruby,
> >> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
> >> contained
> >> > and relies of simple build tool (sbt) as its build system and
> dependency
> >> > resolution mechanism.
> >> >
> >> > == External Dependencies ==
> >> > The dependencies all have Apache compatible licenses.
> >> >
> >> > == Cryptography ==
> >> > Not applicable.
> >> >
> >> > == Required Resources ==
> >> > === Mailing Lists ===
> >> >  * kafka-private for private PMC discussions (with moderated
> >> subscriptions)
> >> >  * kafka-dev   * kafka-commits   * kafka-user
> >> >
> >> > === Subversion Directory ===
> >> > [https://svn.apache.org/repos/asf/incubator/kafka]
> >> >
> >> > === Issue Tracking ===
> >> > JIRA Kafka (KAFKA)
> >> >
> >> > === Other Resources ===
> >> > The existing code already has unit tests, so we would like a Hudson
> >> > instance
> >> > to run them whenever a new patch is submitted. This can be added after
> >> > project creation.
> >> >
> >> > == Initial Committers ==
> >> >  * Jay Kreps
> >> >  * Jun Rao
> >> >  * Neha Narkhede
> >> >  * Jakob Homan
> >> >
> >> > == Affiliations ==
> >> >  * Jay Kreps (LinkedIn)
> >> >  * Jun Rao (LinkedIn)
> >> >  * Neha Narkhede (LinkedIn)
> >> >  * Jakob Homan (LinkedIn)
> >> >
> >> > == Sponsors ==
> >> > === Champion ===
> >> > Chris Douglas (Apache Member)
> >> >
> >> > === Nominated Mentors ===
> >> >  * Alan Cabrera (Apache Member)
> >> >  * Geir Magnusson, Jr. (Apache Member and Director)
> >> >  * Owen O'Malley (Apache Member)
> >> >
> >> > === Sponsoring Entity ===
> >> > We are requesting the Incubator to sponsor this project.
> >> >
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [PROPOSAL] Kafka for the Apache Incubator

Reply via email to