+1 (binding) On Thu, Dec 11, 2014 at 10:02 AM, Daniel Dai <dai...@gmail.com> wrote:
> Following the discussion earlier, I'm calling a vote to accept SAMOA as a > new Incubator project. > > [ ] +1 Accept SAMOA into the Incubator > [ ] +0 Indifferent to the acceptance of SAMOA > [ ] -1 Do not accept SAMOA because ... > > The vote will be open for at least 72h and closes at the earliest on Dec 14 > 19:00 GMT. > > https://wiki.apache.org/incubator/SAMOAProposal > > Thanks, > Daniel > > = SAMOA = > == Abstract == > SAMOA is an an open-source platform for mining big data streams. > > == Proposal == > SAMOA provides a collection of distributed streaming algorithms for the > most common data mining and machine learning tasks such as classification, > clustering, and regression, as well as programming abstractions to develop > new algorithms that run on top of distributed stream processing engines > (DSPEs). It features a pluggable architecture that allows it to run on > several DSPEs such as Apache Storm, Apache S4, and Apache Samza. > > == Background == > Hadoop and its ecosystem have changed the way data are processed by > allowing to push algorithms to unprecedented scale. As an example, Mahout > allows to run data mining and machine learning algorithms on very large > datasets. However, Hadoop and Mahout are not suited to handle streaming > data. Simply put, the goal of SAMOA is to provide a streaming counterpart > to Mahout. > > == Rationale == > SAMOA aims to fill the current gap in tools for mining large scale streams. > Many organizations can benefit from a scalable stream mining platform > system such as SAMOA. > > SAMOA is a natural fit for the Apache Software Foundation. It is licensed > under the ASL v2.0. It already interoperates with several existing Apache > projects such as Storm, S4, and Samza. Furthermore, it is complementary to > existing Apache projects such as Mahout. The initial committers are > familiar with the Apache process and subscribes to the Apache mission. > Indeed, the team includes multiple Apache committers. Finally, joining > Apache will help coordinate the development effort of the growing number of > organizations which contribute to SAMOA. > > == Initial Goals == > * Move the existing codebase to Apache > * Integrate with the Apache development process > * Incremental development and releases per Apache guidelines > > == Current Status == > SAMOA started as a research project at Yahoo Labs in 2013 and was > open-sourced in October the same year. It has been under development on > Yahoo's public GitHub repository since being open-sourced. It has undergone > two releases (0.1, 0.2). > > === Meritocracy === > The SAMOA project already operates on meritocratic principles. Today, SAMOA > has several developers and has accepted multiple patches from outside of > Yahoo Labs. However, our intent with this incubator proposal is to start > building a more diverse developer community around SAMOA that follows the > Apache meritocracy model. We will identify all committers and PPMC members > for the project operating under the ASF meritocratic principles. We plan to > continue support for new contributors and work with those who contribute > significantly to the project to make them committers. > > === Community === > SAMOA is currently being used internally at Yahoo. Acceptance into the > Apache foundation would bolster the existing user and developer community > around SAMOA. That community includes contributors from several > institutions, active mostly on GitHub's pages. SAMOA has been starred more > than 300 times and forked more than 50 times on GitHub as of November 2014. > > === Core Developers === > The core developers are a diverse group, many of which already very > experienced with open source. There are two existing Apache committers, > along with people from various companies and universities. > > === Alignment === > The ASF is the natural choice to host SAMOA. First, its goal of encouraging > community-driven open-source projects fits with our vision for SAMOA. > Additionally, many other projects that SAMOA is based on, such as Apache > Storm, S4, Samza, and HDFS, are hosted by the ASF. Close proximity of SAMOA > to these projects within the ASF will provide mutual benefit. > > == Known Risks == > === Orphaned Products === > Given the current level of investment in SAMOA the risk of the project > being abandoned is minimal. There are several constituents who are highly > incentivized to continue development, and Yahoo Labs relies on SAMOA as a > platform for a large number of long-term research projects. However, the > small number of initial committers might be a concern. We plan to address > this issue during incubation by growing the community and the number of > committers. > > === Inexperience with Open Source === > SAMOA has existed as a healthy open source project for one year. During > this time, we have curated an open-source community successfully, > attracting developers from a diverse group of universities and companies > including Huawei, Yahoo, University of Porto, and Universitat Politecnica > de Catalunya. > > Gianmarco is a committer for Apache Pig, Matthieu for Apache S4. Albert is > one of the lead developers of MOA, an open-source tool for streaming > machine learning. > > === Homogenous Developers === > The initial list of committers includes developers from several > institutions, both academic and industrial. The committers are > geographically distributed across Europe, America, and Asia. > > === Reliance on Salaried Developers === > Like most open source projects, SAMOA receives a substantial support from > salaried developers. In addition, those working from within corporations > often devote “after hours” or spare time in the project - and these come > from several organizations. We will work to ensure the ability for the > project to continuously be stewarded and to proceed forward independently > of salaried developers. > > === Relationship with Other Apache Products === > SAMOA interoperates with several existing Apache projects, mainly by using > them as stream processing engines: Apache Storm, Apache S4, and Apache > Samza. It is a counterpart of Apache Mahout for streaming. It also uses > several other Apache components, including Apache Maven and several Apache > Commons libraries. > > === A Excessive Fascination with the Apache Brand === > SAMOA is already a healthy and relatively well known open source project. > This proposal is not for the purpose of generating publicity. Rather, the > primary benefits to joining Apache are those outlined in the Rationale > section. We are more interested in establishing a strong community that can > drive the project independently of Yahoo. > > == Documentation == > The reader will find these websites relevant: > > * SAMOA website: http://samoa-project.net/ > * SAMOA documentation: https://github.com/yahoo/samoa/wiki/ > * Issue tracking: https://github.com/yahoo/samoa/issues > * Codebase: https://github.com/yahoo/samoa > * User group: http://groups.google.com/group/samoa-user > > == Initial Source == > The SAMOA codebase is currently hosted on GitHub: > https://github.com/yahoo/samoa. This is the exact codebase that we would > migrate to the Apache foundation. > > == Source and Intellectual Property Submission Plan == > Currently, the SAMOA codebase is distributed under an Apache license v2.0. > The vast majority of code has copyright held by Yahoo. Upon entering the > Incubator, Yahoo will grant a license to the Apache foundation. In certain > cases where individuals or organizations hold copyright, we will ensure > they grant a license to the Apache foundation. Going forward, all commits > will be licensed directly to the Apache foundation through our signed > Individual Contributor License Agreements for all committers on the > project. > > == Cryptography == > We do not expect SAMOA to be a controlled export item due to the use of > encryption. > > == External Dependencies == > To the best of our knowledge, all dependencies of SAMOA are distributed > under Apache compatible licenses. Upon acceptance to the incubator, we > would begin a thorough analysis of all transitive dependencies to verify > this fact and introduce license checking into the build and release process > (for instance integrating Apache Rat). > > == Required Resources == > === Mailing Lists === > We will migrate the existing SAMOA mailing lists as follows: > > * samoa-users@googlegroups --> us...@samoa.incubator.apache.org > * samoa-developers@googlegroups --> d...@samoa.incubator.apache.org > > SAMOA commits are hosted on GitHub, so we would request the following > mailing list: > > * comm...@samoa.incubator.apache.org > > We would also request the following mailing list: > > * priv...@samoa.incubator.apache.org (with moderated subscription) > > === Source control === > The SAMOA team would like to use Git for source control, due to our current > use of Git. We request a writeable Git repo for SAMOA, and mirroring to be > set up to GitHub through INFRA. > > * https://git-wip-us.apache.org/repos/asf/incubator-samoa.git > > === Issue Tracking === > SAMOA currently uses GitHub for issue tracking. We will migrate to the > Apache JIRA instance. http://issues.apache.org/jira/browse/SAMOA > > == Initial Committers & Affiliations == > * Albert Bifet, Huawei, <abifet at waikato dot ac dot nz> > * Gianmarco De Francisci Morales, Yahoo Labs, <gdfm at apache dot org> > * Nicolas Kourtellis, Yahoo Labs, <nkourtellis at gmail dot com> > * Matthieu Morel, Yahoo Labs, <mmorel at apache dot org> > * Arinto Murdopo, Living Analytics Research Centre, <arintom at smu dot edu > dot sg> > * Olivier Van Laere, BlueShift Labs, <olivier at getblueshift dot com> > > == Sponsors == > === Champion === > * Daniel Dai <daijy at apache dot org> > > === Nominated Mentors === > * Alan Gates <gates at apache dot org> > * Ted Dunning <tdunning at apache dot org> > * Ashutosh Chauhan <hashutosh at apache dot org> > * Enis Soztutar <enis at apache dot org> > > === Sponsoring Entity === > The Apache Incubator > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)