+1 (Non-binding) On 10/27/15, 11:08 PM, "Raymond Feng" <enjoyj...@gmail.com> wrote:
>Luciano, > >There is a copy/paste error pointing to >http://wiki.apache.org/incubator/Nuvem. > >Sent from my iPhone 6 Plus > >> On Oct 27, 2015, at 10:03 PM, Luciano Resende <luckbr1...@gmail.com> >>wrote: >> >> On Tue, Oct 27, 2015 at 9:52 PM, Luciano Resende <luckbr1...@gmail.com> >> wrote: >> >>> >>> After initial discussion, please vote on the acceptance of SystemML >>> Project for incubation at the Apache Incubator. The full proposal is >>> available at the end of this message and on the wiki at : >>> >>> https://wiki.apache.org/incubator/SystemML >>> <http://wiki.apache.org/incubator/Nuvem> >>> >>> Please cast your votes: >>> >>> [ ] +1, bring SystemML into Incubator >>> [ ] +0, I don't care either way >>> [ ] -1, do not bring SystemML into Incubator, because... >>> >>> The vote is open for the next 72 hours and only votes from the >>> Incubator PMC are binding. >>> >>> >>> = SystemML = >>> >>> == Abstract == >>> >>> SystemML provides declarative large-scale machine learning (ML) that >>>aims >>> at flexible specification of ML algorithms and automatic generation of >>> hybrid runtime plans ranging from single node, in-memory computations, >>>to >>> distributed computations on Apache Hadoop MapReduce and Apache Spark. >>>ML >>> algorithms are expressed in an R-like syntax, that includes linear >>>algebra >>> primitives, statistical functions, and ML-specific constructs. This >>> high-level language significantly increases the productivity of data >>> scientists as it provides (1) full flexibility in expressing custom >>> analytics, and (2) data independence from the underlying input formats >>>and >>> physical data representations. Automatic optimization according to data >>> characteristics such as distribution on the disk file system, and >>>sparsity >>> as well as processing characteristics in the distributed environment >>>like >>> number of nodes, CPU, memory per node, ensures both efficiency and >>> scalability. >>> >>> == Proposal == >>> >>> The goal of SystemML is to create a commercial friendly, scalable and >>> extensible machine learning framework for data scientists to create or >>> extend machine learning algorithms using a declarative syntax. The >>>machine >>> learning framework enables data scientists to develop algorithms >>>locally >>> without the need of a distributed cluster, and scale up and scale out >>>the >>> execution of these algorithms to distributed Apache Hadoop MapReduce or >>> Apache Spark clusters. >>> >>> == Background == >>> >>> SystemML started as a research project in the IBM Almaden Research >>>Center >>> around 2007 aiming to enable data scientists to develop machine >>>learning >>> algorithms independent of data and cluster characteristics. >>> >>> == Rationale == >>> >>> SystemML enables the specification of machine learning algorithms >>>using a >>> declarative machine learning (DML) language. DML includes linear >>>algebra >>> primitives, statistical functions, and additional constructs. This >>> high-level language significantly increases the productivity of data >>> scientists as it provides (1) full flexibility in expressing custom >>> analytics and (2) data independence from the underlying input formats >>>and >>> physical data representations. >>> >>> SystemML computations can be executed in a variety of different modes. >>>It >>> supports single node in-memory computations and large-scale distributed >>> cluster computations. This allows the user to quickly prototype new >>> algorithms in local environments but automatically scale to large data >>> sizes as well without changing the algorithm implementation. >>> >>> Algorithms specified in DML are dynamically compiled and optimized >>>based >>> on data and cluster characteristics using rule-based and cost-based >>> optimization techniques. The optimizer automatically generates hybrid >>> runtime execution plans ranging from in-memory single-node execution to >>> distributed computations on Apache Spark or Apache Hadoop MapReduce. >>>This >>> ensures both efficiency and scalability. Automatic optimization >>>reduces or >>> eliminates the need to hand-tune distributed runtime execution plans >>>and >>> system configurations. >>> >>> == Initial Goals == >>> >>> The initial goals to move SystemML to the Apache Incubator is to >>>broaden >>> the community foster the contributions from data scientists to develop >>>new >>> machine learning algorithms and enhance the existing ones. Ultimately, >>>this >>> may lead to the creation of an industry standard in specifying machine >>> learning algorithms. >>> >>> == Current Status == >>> >>> The initial code has been developed at the IBM Almaden Research Center >>>in >>> California and has recently been made available in GitHub under the >>>Apache >>> Software License 2.0. The project currently supports a single node (in >>> memory computation) as well as distributed computations utilizing >>>Apache >>> Hadoop MapReduce or Apache Spark clusters. >>> >>> === Meritocracy === >>> >>> We plan to invest in supporting a meritocracy. We will discuss the >>> requirements in an open forum. Several companies have already expressed >>> interest in this project, and we intend to invite additional >>>developers to >>> participate. We will encourage and monitor community participation so >>>that >>> privileges can be extended to those that contribute operating to the >>> standard of meritocracy that Apache emphasizes. >>> >>> === Community === >>> >>> The need for a generic scalable and declarative machine learning >>>approach >>> in the open source is tremendous, so there is a potential for a very >>>large >>> community. We believe that SystemML¹s extensible architecture, >>>declarative >>> syntax, cost based optimizer and its alignment with Spark will further >>> encourage community participation not only in enhancing the >>>infrastructure >>> but also speed up the creation of algorithms for a wide range of use >>> cases. We expect that over time SystemML will attract a large >>>community. >>> >>> === Alignment === >>> >>> The initial committers strongly believe that a generic scalable and >>> declarative machine learning approach for machine learning will gain >>> broader adoption as an open source, community driven project, where the >>> community can contribute not only to the core components, but also to a >>> growing collection of algorithms which will leverage the optimizations >>>and >>> ease of scaling in SystemML. Our hope is that the Apache Spark, Apache >>> Hadoop and other communities will find tremendous value in SystemML and >>> this will foster further collaboration between these projects >>>furthering >>> the already existing integration points. >>> >>> == Known Risks == >>> >>> To-date, development has been sponsored by IBM and coordinated mostly >>>by >>> the core team of researchers at the IBM Almaden Research Center. >>> >>> For SystemML to fully transition to an "Apache Way" governance model, >>>it >>> needs to start embracing the meritocracy-centric way of growing the >>> community of contributors. >>> >>> === Orphaned Products === >>> >>> The SystemML developers and previous sponsor have a long-term interest >>>in >>> use and maintenance of the code and there is also hope that growing a >>> diverse community around the project will become a guarantee against >>>the >>> project becoming orphaned. We feel that it is also important to put >>>formal >>> governance in place both for the project and the contributors as the >>> project expands. We feel ASF is the best location for this. >>> >>> === Inexperience with Open Source === >>> >>> The current SystemML set of contributors are very diverse regarding >>> participation in Open Source. While some initial members are >>>experiencing >>> an open source project for the first time, others have been >>>contributing >>> and mentoring various Apache and non-Apache open source projects. >>> >>> === Reliance on Salaried Developers === >>> >>> SystemML currently receives substantial support from salaried >>>developers. >>> However, they are all passionate about the project, and we are >>>confident >>> that the project will continue even if no salaried developers >>>contribute to >>> the project. We are committed to recruiting additional committers >>>including >>> non-salaried developers. >>> >>> >>> === Relationships with Other Apache Products === >>> >>> Currently, SystemML integrates with Apache Hadoop MapReduce and Apache >>> Spark as underlying computational distributed runtimes. >>> >>> === An Excessive Fascination with the Apache Brand === >>> >>> SystemML solves a real need for generic scalable and declarative >>>machine >>> learning approach for machine learning in the Apache Hadoop and Spark >>> ecosystems, something that has been addressed in a very ad hoc manner >>>so >>> far by multiple Apache projects. Our rationale for developing SystemML >>>as >>> an Apache project is detailed in the Rationale section. We believe >>>that the >>> Apache brand and community process will help us attract more >>>contributors >>> to this project, and help establish ubiquitous APIs. >>> >>> >>> == Documentation == >>> >>> Documentation regarding SystemML is available in the current GitHub >>> repository >>>https://github.com/SparkTC/systemml/tree/master/system-ml/docs. >>> >>> >>> == Initial Source == >>> >>> Initial source is available on GitHub under the Apache License 2.0 >>> >>> https://github.com/SparkTC/systemml >>> >>> == Source and Intellectual Property Submission Plan == >>> >>> We know of no legal encumbrances in the transfer of source code and >>>rights >>> to Apache. In fact, given the internal IBM due diligence performed on >>>the >>> source code during open sourcing, we expect the code base to be free >>>from >>> any IP issues. >>> >>> == External Dependencies == >>> >>> SystemML is written in Java and currently supports Apache Hadoop >>>MapReduce >>> and Apache Spark runtimes. >>> >>> To the best of our knowledge, all dependencies of SystemML are >>>distributed >>> under Apache compatible licenses. Upon acceptance to the incubator, we >>> would begin a thorough analysis of all transitive dependencies to >>>verify >>> this fact and introduce license checking into the build and release >>>process >>> (for instance integrating Apache Rat). >>> >>> Cryptography >>> N/A >>> >>> == Required Resources == >>> >>> === Mailing lists === >>> * priv...@sysml.incubator.apache.org (moderated subscriptions) >>> * comm...@sysml.incubator.apache.org >>> * d...@sysml.incubator.apache.org >>> >>> === Git Repository === >>> * https://git-wip-us.apache.org/repos/asf/incubator-sysml.git >>> >>> === Issue Tracking === >>> * JIRA (SYSML) >>> >>> == Initial Committers == >>> >>> * Luciano Resende (lresende AT apache DOT org) >>> * Berthold Reinwald (reinwald AT us DOT ibm DOT com) >>> * Matthias Boehm (mboehm AT us DOT ibm DOT com) >>> * Shirish Tatikonda (statiko AT us DOT ibm DOT com) >>> * Niketan Pansare (npansar AT us DOT ibm DOT com) >>> * Prithviraj Sen (senp AT us DOT ibm DOT com) >>> * Alexandre V Evfimievski (evfimi AT us DOT ibm DOT com) >>> * Fred Reiss (frreiss AT us DOT ibm DOT com) >>> * Deron Eriksson (deron AT us DOT ibm DOT com) >>> * Arvind Surve (asurve AT us DOT ibm DOT com) >>> * Mike Dusenberry (mwdusenb AT us DOT ibm DOT com) >>> * Reynold Xin (rxin AT apache DOT org) >>> * Xiangrui Meng (meng AT apache DOT org) >>> * Joseph Bradley (jkbradley AT apache DOT org) >>> * Patrick Wendell (pwendell AT apache DOT org) >>> * Holden Karau (holden AT apache DOT org) >>> * DB Tsai (dbtsai AT apache DOT org) >>> >>> == Affiliations == >>> >>> * DataBricks: Reynold Xin, Xiangrui Meng, Joseph Bradley, Patrick >>>Wendell >>> * Netflix: DB Tsai >>> * IBM: Luciano Resende, Berthold Reinwald, Matthias Boehm, Shirish >>> Tatikonda, Niketan Pansare, Prithviraj Sen, Alexandre V Evfimievski, >>>Fred >>> Reiss, Deron Eriksson, Arvind Surve, Mike Dusenberry and Holden Karau. >>> >>> == Sponsors == >>> >>> === Champion === >>> * Luciano Resende >>> >>> === Nominated Mentors === >>> * Luciano Resende >>> * Reynold Xin >>> * Patrick Wendell >>> * Rich Bowen >>> >>> === Sponsoring Entity === >>> We would like to propose the Apache Incubator to sponsor this project. >> Off course, my +1 >> >> -- >> Luciano Resende >> http://people.apache.org/~lresende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org