+1 (binding) -C On Fri, Jul 26, 2013 at 12:52 PM, Jakob Homan <jgho...@gmail.com> wrote: > Incubator- > > Following the discussion earlier this week, I'm calling a vote to accept > Samza as a new Incubator project. > > The proposal draft is available at: > https://wiki.apache.org/incubator/SamzaProposal, > and is also included below. It is identical as what was proposed in the > discussion except for removing the user list, per Marvin's suggestion. > > Vote is open for at least 96h and closes at the earliest on 30 July 13:00 > PDT. I'm letting the vote run an extra day as we're bookending the weekend > and I want to give everybody a reasonable workweek margin. > > [ ] +1 accept Samza in the Incubator > [ ] +/-0 > [ ] -1 because... > > Here's my binding +1 > > -Jakob > > ======================================================================================================== > > Abstract > > Samza is a stream processing system for running continuous computation on > infinite streams of data. > > Proposal > > Samza provides a system for processing stream data from publish-subscribe > systems such as Apache Kafka. The developer writes a stream processing > task, and executes it as a Samza job. Samza then routes messages between > stream processing tasks and the publish-subscribe systems that the messages > are addressed to. > > Background > > Samza was developed at LinkedIn to enable easier processing of streaming > data on top of Apache Kafka. Current use cases include content processing > pipelines, aggregating operational log data, data ingestion into > distributed database infrastructure, and measuring user activity across > different aggregation types. > > Samza is focused on providing an easy to use framework to process streams. > It uses Apache YARN to provide a mechanism for deploying stream processing > tasks in a distributed cluster. Samza also takes advantage of YARN to make > decisions about stream processor locality, co-partition of streams, and > provide security. Apache Kafka is also leveraged to provide a mechanism to > pass messages from one stream processor to the next. Apache Kafka is also > used to help manage a stream processor's state, so that it can be recovered > in the event of a failure. > > Samza is written in Scala. It was developed internally at LinkedIn to meet > our particular use cases, but will be useful to many organizations facing a > similar need to reliably process large amounts of streaming data. > Therefore, we would like to share it the ASF and begin developing a > community of developers and users within Apache. > > Rationale > > Many organizations can benefit from a reliable stream processing system > such as Samza. While our use case of processing events from a large website > like LinkedIn has driven the design of Samza, its uses are varied and we > expect many new use cases to emerge. Samza provides a generic API to > process messages from streaming infrastructure and will appeal to many > users. > > Current Status > > Meritocracy > > Our intent with this incubator proposal is to start building a diverse > developer community around Samza following the Apache meritocracy model. > Since Samza was initially developed in late 2011, we have had fast adoption > and contributions by multiple teams at LinkedIn. We plan to continue > support for new contributors and work with those who contribute > significantly to the project to make them committers. > > Community > > Samza is currently being used internally at LinkedIn. We hope to extend our > contributor base significantly and invite all those who are interested in > building large-scale distributed systems to participate. > > Core Developers > > Samza is currently being developed by four engineers at LinkedIn: Jay > Kreps, Jakob Homan, Sriram Subramanian, and Chris Riccomini. Jakob is an > ASF Member, Incubator PMC member and PMC member on Apache Hadoop, Kafka and > Giraph. Jay is a member of the Apache Kafka PMC and contributor to various > Apache projects. Chris has been an active contributor for several projects > including Apache Kafka and Apache YARN. Sriram has contributed to Samza, as > well as Apache Kafka. > > Alignment > > The ASF is the natural choice to host the Samza project as its goal of > encouraging community-driven open-source projects fits with our vision for > Samza. Additionally, many other projects with which we are familiar with > and expect Samza to integrate with, such as Apache ZooKeeper, YARN, HDFS > and log4j are hosted by the ASF and we will benefit and provide benefit by > close proximity to them. > > Known Risks > > Orphaned Products > > The core developers plan to work full time on the project. There is very > little risk of Samza being abandoned as it is part of LinkedIn's internal > infrastructure. > > Inexperience with Open Source > > All of the core developers have experience with open source development. > Jay and Chris has been involved with several open source projects released > by LinkedIn, and Jay is a committer on Apache Kafka. Jakob has been > actively involved with the ASF as a full-time Hadoop committer and PMC > member. Sriram is a contributor to Apache Kafka. > > Homogeneous Developers > > The current core developers are all from LinkedIn. However, we hope to > establish a developer community that includes contributors from several > corporations and we actively encouraging new contributors via the mailing > lists and public presentations of Samza. > > Reliance on Salaried Developers > > Currently, the developers are paid to do work on Samza. However, once the > project has a community built around it, we expect to get committers, > developers and community from outside the current core developers. However, > because LinkedIn relies on Samza internally, the reliance on salaried > developers is unlikely to change. > > Relationships with Other Apache Products > > Samza is deeply integrated with Apache products. Samza uses Apache Kafka as > its underlying message passing system. Samza also uses Apache YARN for task > scheduling. Both YARN and Kafka, in turn, rely on Apache ZooKeeper for > coordination. In addition, we hope to integrate with Apache HDFS in the > near future. > > An Excessive Fascination with the Apache Brand > > While we respect the reputation of the Apache brand and have no doubts that > it will attract contributors and users, our interest is primarily to give > Samza a solid home as an open source project following an established > development model. We have also given reasons in the Rationale and > Alignment sections. > > Documentation > > http://wiki.apache.org/incubator/SamzaProposal > > Initial Source > > Available upon request. > > External Dependencies > > The dependencies all have Apache compatible licenses. > > metrics (Apache 2.0) > zkclient (Apache 2.0) > zookeeper (Apache 2.0) > jetty (Apache 2.0) > jackson (Apache 2.0) > commons-httpclient (Apache 2.0) > slf4j (MIT) > avro (Apache 2.0) > hadoop (Apache 2.0) > junit (Common Public License) > grizzled-slf4j (BSD) > scalatra (https://github.com/scalatra/scalatra/blob/develop/LICENSE) > scala (http://www.scala-lang.org/node/146) > joptsimple (MIT) > kafka (Apache 2.0) > scalate (Apache 2.0) > leveldb jni (BSD) > Cryptography > > Samza will depend on secure Hadoop, which can optionally use Kerberos. > > Required Resources > > Mailing Lists > > samza-private for private PMC discussions (with moderated subscriptions) > samza-dev > samza-commits > > Subversion Directory > > Git is the preferred source control system: git://git.apache.org/samza > > Issue Tracking > > JIRA Samza (SAMZA) > > Other Resources > > The existing code already has unit tests, so we would like a Hudson > instance to run them whenever a new patch is submitted. This can be added > after project creation. > > Initial Committers > > Jay Kreps > Jakob Homan > Chris Riccomini > Sriram Subramanian > Affiliations > > Jay Kreps (LinkedIn) > Jakob Homan (LinkedIn) > Chris Riccomini (LinkedIn) > Sriram Subramanian (LinkedIn) > Sponsors > > Champion > > Jakob Homan (Apache Member) > > Nominated Mentors > > Arun C Murthy <acmurthy at apache dot org> > Chris Douglas <cdouglas at apache dot org> > Roman Shaposhnik <rvs at apache dot org> > > Sponsoring Entity > > We are requesting the Incubator to sponsor this project.
--------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org