sounds great +1 to include storm-kafka under a contrib folder of the Apache Storm project & for other modules (moving forward) based on community members showing initiative in working on and maintaing them
On Wed, Sep 4, 2013 at 10:23 PM, Nathan Marz <nathan.m...@gmail.com> wrote: > I think that storm-kafka would make sense as a contrib module since it's > widely used. I'm not sure what to do with the other storm-contrib modules. > I figure the less code that's part of the initial repo the better, because > there will be less contribution/legal issues to sort out. How about this - > we plan to include storm-kafka under a contrib folder of the Apache Storm > project (just because a lot of people depend on it), and we can pull other > storm-contrib modules in if community members show initiative in working on > and maintaining them? > > If that all sounds good I'll update the proposal accordingly. > > > On Sep 4, 2013, at 6:41 PM, Joe Stein <crypt...@gmail.com> wrote: > > > What does this mean for storm contribs ( > > https://github.com/nathanmarz/storm-contrib)? (spouts & bolts) e.g The > > Apache Kafka spout already it is hard to know which to use and which is > > best for 0.7.X and 0.8.X-betaX... Is the Apache Storm project going to > > help corral that or is it only for Storm core as the proposal implies > with > > only the storm code base https://github.com/nathanmarz/storm being part > of > > the project? > > > > A lot of traffic on the existing user list is about spouts (e.g. the > Kafka > > Spout) and I was not sure if that would still be talked about or funneled > > somewhere else or what the thoughts/plans where for the parts built > within > > Storm that are existing now? > > > > /******************************************* > > Joe Stein > > Founder, Principal Consultant > > Big Data Open Source Security LLC > > http://www.stealth.ly > > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> > > ********************************************/ > > > > > > On Wed, Sep 4, 2013 at 4:34 PM, Nathan Marz <nat...@nathanmarz.com> > wrote: > > > >> We definitely need a storm-user list as the existing google groups > mailing > >> list for Storm is quite active. So we'll need to transition that over. I > >> agree on adding a storm-commits list and added it to the proposal. > >> > >> > >> On Wed, Sep 4, 2013 at 11:50 AM, Henry Saputra <henry.sapu...@gmail.com > >>> wrote: > >> > >>> Excited about Storm coming to Apache. Small comment about the mailing > >> list, > >>> you may want to propose having: > >>> * storm-dev > >>> * storm-commits > >>> * storm-private (with moderated subscriptions) > >>> > >>> instead for starting into incubator. > >>> > >>> However, Storm has been a well known open source project, maybe it does > >>> valid to have storm-user from the beginning. But I think you may need > >>> storm-commits > >>> list to separate commits log from dev discussions. > >>> Mentors can chime in about this. > >>> > >>> Thanks, > >>> > >>> Henry > >>> > >>> > >>> > >>> On Wed, Sep 4, 2013 at 1:07 AM, Nathan Marz <nat...@nathanmarz.com> > >> wrote: > >>> > >>>> Hi everyone, > >>>> > >>>> I'd like to propose Storm to be an Apache Incubator project. After > much > >>>> thought I believe this is the right next step for the project, and I > >> look > >>>> forward to hearing everyone's thoughts and feedback! > >>>> > >>>> Here's a link to the proposal: > >>>> https://wiki.apache.org/incubator/StormProposal > >>>> > >>>> The proposal is also pasted below. > >>>> > >>>> -Nathan > >>>> > >>>> > >>>> = Storm Proposal = > >>>> > >>>> == Abstract == > >>>> > >>>> Storm is a distributed, fault-tolerant, and high-performance realtime > >>>> computation system that provides strong guarantees on the processing > of > >>>> data. > >>>> > >>>> == Proposal == > >>>> > >>>> Storm is a distributed real-time computation system. Similar to how > >>> Hadoop > >>>> provides a set of general primitives for doing batch processing, Storm > >>>> provides a set of general primitives for doing real-time computation. > >> Its > >>>> use cases span stream processing, distributed RPC, continuous > >>> computation, > >>>> and more. Storm has become a preferred technology for near-realtime > >>>> big-data processing by many organizations worldwide (see a partial > list > >>> at > >>>> https://github.com/nathanmarz/storm/wiki/Powered-By). As an open > >> source > >>>> project, Storm’s developer community has grown rapidly to 46 members. > >>>> > >>>> == Background == > >>>> > >>>> The past decade has seen a revolution in data processing. MapReduce, > >>>> Hadoop, and related technologies have made it possible to store and > >>> process > >>>> data at scales previously unthinkable. Unfortunately, these data > >>> processing > >>>> technologies are not realtime systems, nor are they meant to be. The > >> lack > >>>> of a "Hadoop of realtime" has become the biggest hole in the data > >>>> processing ecosystem. Storm fills that hole. > >>>> > >>>> Storm was initially developed and deployed at BackType in 2011. After > 7 > >>>> months of development BackType was acquired by Twitter in July 2011. > >>> Storm > >>>> was open sourced in September 2011. > >>>> > >>>> Storm has been under continuous development on its Github repository > >>> since > >>>> being open-sourced. It has undergone four major releases (0.5, 0.6, > >> 0.7, > >>>> 0.8) and many minor ones. > >>>> > >>>> == Rationale == > >>>> > >>>> Storm is a general platform for low-latency big-data processing. It is > >>>> complementary to the existing Apache projects, such as Hadoop. Many > >>>> applications are actually exploring using both Hadoop and Storm for > >>>> big-data processing. Bringing Storm into Apache is very beneficial to > >>> both > >>>> Apache community and Storm community. > >>>> > >>>> The rapid growth of Storm community is empowered by open source. We > >>> believe > >>>> the Apache foundation is a great fit as the long-term home for Storm, > >> as > >>> it > >>>> provides an established process for community-driven development and > >>>> decision making by consensus. This is exactly the model we want for > >>> future > >>>> Storm development. > >>>> > >>>> == Initial Goals == > >>>> > >>>> * Move the existing codebase to Apache > >>>> * Integrate with the Apache development process > >>>> * Ensure all dependencies are compliant with Apache License version > >> 2.0 > >>>> * Incremental development and releases per Apache guidelines > >>>> > >>>> == Current Status == > >>>> > >>>> Storm has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many > >>> minor > >>>> ones. Storm 0.9 is about to be released. Storm is being used in > >>> production > >>>> by over 50 organizations. Storm codebase is currently hosted at > >>> github.com > >>>> , > >>>> which will seed the Apache git repository. > >>>> > >>>> === Meritocracy === > >>>> > >>>> We plan to invest in supporting a meritocracy. We will discuss the > >>>> requirements in an open forum. Several companies have already > expressed > >>>> interest in this project, and we intend to invite additional > developers > >>> to > >>>> participate. We will encourage and monitor community participation so > >>> that > >>>> privileges can be extended to those that contribute. > >>>> > >>>> === Community === > >>>> > >>>> The need for a low-latency big-data processing platform in the open > >>> source > >>>> is tremendous. Storm is currently being used by at least 50 > >> organizations > >>>> worldwide (see https://github.com/nathanmarz/storm/wiki/Powered-By), > >> and > >>>> is > >>>> the most starred Java project on Github. By bringing Storm into > Apache, > >>> we > >>>> believe that the community will grow even bigger. > >>>> > >>>> === Core Developers === > >>>> > >>>> Storm was started by Nathan Marz at BackType, and now has developers > >> from > >>>> Yahoo!, Microsoft, Alibaba, Infochimps, and many other companies. > >>>> > >>>> === Alignment === > >>>> > >>>> In the big-data processing ecosystem, Storm is a very popular > >> low-latency > >>>> platform, while Hadoop is the primary platform for batch processing. > We > >>>> believe that it will help the further growth of big-data community by > >>>> having Hadoop and Storm aligned within Apache foundation. The > alignment > >>> is > >>>> also beneficial to other Apache communities (such as Zookeeper, > Thrift, > >>>> Mesos). We could include additional sub-projects, Storm-on-YARN and > >>>> Storm-on-Mesos, in the near future. > >>>> > >>>> == Known Risks == > >>>> > >>>> === Orphaned Products === > >>>> > >>>> The risk of the Storm project being abandoned is minimal. There are at > >>>> least 50 organizations (Twitter, Yahoo!, Microsoft, Groupon, Baidu, > >>>> Alibaba, Alipay, Taobao, PARC, RocketFuel etc) are highly incentivized > >> to > >>>> continue development. Many of these organizations have built critical > >>>> business applications upon Storm, and have devoted significant > internal > >>>> infrastructure investment in Storm. > >>>> > >>>> === Inexperience with Open Source === > >>>> > >>>> Storm has existed as a healthy open source project for several years. > >>>> During that time, we have curated an open-source community > >> successfully, > >>>> attracting over 40 developers from a diverse group of companies > >> including > >>>> Twitter, Yahoo!, and Alibaba. > >>>> > >>>> === Homogenous Developers === > >>>> > >>>> The initial committers are employed by large companies (including > >>> Twitter, > >>>> Yahoo!, Alibaba, Microsoft) and well-funded startups. Storm has an > >> active > >>>> community of developers, and we are committed to recruiting additional > >>>> committers based on their contributions to the project. > >>>> > >>>> === Reliance on Salaried Developers === > >>>> > >>>> It is expected that Storm development will occur on both salaried time > >>> and > >>>> on volunteer time, after hours. The majority of initial committers are > >>> paid > >>>> by their employer to contribute to this project. However, they are all > >>>> passionate about the project, and we are confident that the project > >> will > >>>> continue even if no salaried developers contribute to the project. We > >> are > >>>> committed to recruiting additional committers including non-salaried > >>>> developers. > >>>> > >>>> === Relationships with Other Apache Products === > >>>> > >>>> As mentioned in the Alignment section, Storm is closely integrated > with > >>>> Hadoop, > >>>> Zookeeper, Thrift, YARN and Mesos in a numerous ways. We look forward > >> to > >>>> collaborating with those communities, as well as other Apache > >> communities > >>>> (including Apache S4 which focuses on stateful low-latency > processing). > >>>> > >>>> === An Excessive Fascination with the Apache Brand === > >>>> > >>>> Storm is already a healthy and well known open source project. This > >>>> proposal is not for the purpose of generating publicity. Rather, the > >>>> primary benefits to joining Apache are those outlined in the Rationale > >>>> section. > >>>> > >>>> == Documentation == > >>>> > >>>> The reader will find these websites highly relevant: > >>>> > >>>> * Storm website: http://storm-project.net > >>>> * Storm documentation: https://github.com/nathanmarz/storm/wiki > >>>> * Codebase: https://github.com/nathanmarz/storm > >>>> * User group: https://groups.google.com/group/storm-user > >>>> > >>>> == Source and Intellectual Property Submission Plan == > >>>> > >>>> The Storm codebase is currently hosted on Github: > >>>> https://github.com/nathanmarz/storm. > >>>> > >>>> This is the exact codebase that we would migrate to the Apache > >>> foundation. > >>>> > >>>> The Storm source code is currently licensed under Eclipse Public > >> License > >>>> Version 1.0. Some source code was contributed under a contributor > >>> agreement > >>>> based on the Sun contributor agreement (v1.5). More recent code has > >> been > >>>> contributed under an Apache style agreement (see > >> > https://dl.dropboxusercontent.com/u/133901206/storm-apache-style-cla.txt > >>> ). > >>>> > >>>> Upon entering Apache, Storm will migrate to an Apache License 2.0 with > >>> all > >>>> contributions licensed to the Apache Foundation. In certain cases > where > >>>> individuals or organizations hold copyright, we will ensure they grant > >> a > >>>> license to the Apache Foundation. Going forward, all commits will be > >>>> licensed directly to the Apache foundation through our signed > >> Individual > >>>> Contributor License Agreements for all committers on the project. > >>>> > >>>> Yahoo! is also willing to move Storm-on-YARN code from github to be a > >>>> subproject of Apache Storm project. Storm-on-YARN is currently > licensed > >>>> under Apache License 2.0 and receive contribution under Apache style > >> CLA. > >>>> Upon entering Apache, Yahoo! will sign over copyright to Apache > >>> foundation. > >>>> > >>>> == External Dependencies == > >>>> > >>>> To the best of our knowledge, all of Storm dependencies (except > >> 0MQ/JMQ) > >>>> are distributed under Apache compatible licenses. Upon acceptance to > >> the > >>>> incubator, we would begin a thorough analysis of all transitive > >>>> dependencies to verify this fact and introduce license checking into > >> the > >>>> build and release process (for instance integrating Apache Rat). > >>>> > >>>> Storm has used 0MQ and JMQ as the default mechanism for internal > >>> messaging > >>>> layer, and 0MQ/JMQ is licensed under GNU Lesser General Public > License. > >>>> Recently, we have made Storm messaging layer pluggable, and plan to > use > >>>> Netty (which is licensed under Apache License v2) as our default > >>> messaging > >>>> plugin (while keep 0MQ as an optional plugin). > >>>> > >>>> == Cryptography == > >>>> > >>>> We do not expect Storm to be a controlled export item due to the use > of > >>>> encryption. > >>>> > >>>> Storm enable encryptions via 2 plugins: > >>>> > >>>> * SASL authentication plugins … Currently, we have provide “no-op” > >>>> authentication and digest authentication. In near future, we will > >>> introduce > >>>> Kerberos authentication. > >>>> * Tuple payload serialization plugins … Storm provides plugins for > >>>> plain-object serialization and blowfish encryption. > >>>> > >>>> == Required Resources == > >>>> > >>>> === Mailing lists === > >>>> > >>>> * storm-user > >>>> * storm-dev > >>>> * storm-private (with moderated subscriptions) > >>>> > >>>> === Subversion Directory === > >>>> > >>>> Git is the preferred source control system: git:// > git.apache.org/storm > >>>> > >>>> === Issue Tracking === > >>>> > >>>> JIRA Storm (STORM) > >>>> > >>>> == Initial Committers == > >>>> > >>>> * Nathan Marz <nathan at nathanmarz dot com> > >>>> * James Xu <xumingmingv at gmail dot com> > >>>> * Jason Jackson <jason at cvk dot ca> > >>>> * Andy Feng <afeng at yahoo-inc dot com> > >>>> * Flip Kromer <flip at infochimps dot com> > >>>> * David Lao <davidlao at microsoft dot com> > >>>> * P. Taylor Goetz <ptgoetz at gmail dot com> > >>>> > >>>> == Affiliations == > >>>> > >>>> * Nathan Marz - Nathan’s Startup > >>>> * James Xu - Alibaba > >>>> * Jason Jackson - Twitter > >>>> * Andy Feng - Yahoo! > >>>> * Flip Kromer - Infochimps > >>>> * David Lao - Microsoft > >>>> * P. Taylor Goetz - Health Market Science > >>>> > >>>> == Sponsors == > >>>> > >>>> === Champion === > >>>> > >>>> * Doug Cutting <cutting at apache dot org> > >>>> > >>>> === Nominated Mentors === > >>>> > >>>> * Ted Dunning <tdunning at maprtech.com> > >>>> * Arvind Prabhaker <arvind at apache dot org> > >>>> * Devaraj Das <ddas at hortonworks dot com> > >>>> > >>>> === Sponsoring Entity === > >>>> > >>>> The Apache Incubator > >> > >> > >> > >> -- > >> Twitter: @nathanmarz > >> http://nathanmarz.com > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >