+1 from me. Thanks! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message----- From: "Manoharan, Arun" <armanoha...@ebay.com> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> Date: Tuesday, October 27, 2015 at 11:36 PM To: "general@incubator.apache.org" <general@incubator.apache.org> Subject: Re: [VOTE] Accept SystemML into Apache Incubator >+1 (Non-binding) > >On 10/27/15, 11:08 PM, "Raymond Feng" <enjoyj...@gmail.com> wrote: > >>Luciano, >> >>There is a copy/paste error pointing to >>http://wiki.apache.org/incubator/Nuvem. >> >>Sent from my iPhone 6 Plus >> >>> On Oct 27, 2015, at 10:03 PM, Luciano Resende <luckbr1...@gmail.com> >>>wrote: >>> >>> On Tue, Oct 27, 2015 at 9:52 PM, Luciano Resende <luckbr1...@gmail.com> >>> wrote: >>> >>>> >>>> After initial discussion, please vote on the acceptance of SystemML >>>> Project for incubation at the Apache Incubator. The full proposal is >>>> available at the end of this message and on the wiki at : >>>> >>>> https://wiki.apache.org/incubator/SystemML >>>> <http://wiki.apache.org/incubator/Nuvem> >>>> >>>> Please cast your votes: >>>> >>>> [ ] +1, bring SystemML into Incubator >>>> [ ] +0, I don't care either way >>>> [ ] -1, do not bring SystemML into Incubator, because... >>>> >>>> The vote is open for the next 72 hours and only votes from the >>>> Incubator PMC are binding. >>>> >>>> >>>> = SystemML = >>>> >>>> == Abstract == >>>> >>>> SystemML provides declarative large-scale machine learning (ML) that >>>>aims >>>> at flexible specification of ML algorithms and automatic generation of >>>> hybrid runtime plans ranging from single node, in-memory computations, >>>>to >>>> distributed computations on Apache Hadoop MapReduce and Apache Spark. >>>>ML >>>> algorithms are expressed in an R-like syntax, that includes linear >>>>algebra >>>> primitives, statistical functions, and ML-specific constructs. This >>>> high-level language significantly increases the productivity of data >>>> scientists as it provides (1) full flexibility in expressing custom >>>> analytics, and (2) data independence from the underlying input formats >>>>and >>>> physical data representations. Automatic optimization according to >>>>data >>>> characteristics such as distribution on the disk file system, and >>>>sparsity >>>> as well as processing characteristics in the distributed environment >>>>like >>>> number of nodes, CPU, memory per node, ensures both efficiency and >>>> scalability. >>>> >>>> == Proposal == >>>> >>>> The goal of SystemML is to create a commercial friendly, scalable and >>>> extensible machine learning framework for data scientists to create or >>>> extend machine learning algorithms using a declarative syntax. The >>>>machine >>>> learning framework enables data scientists to develop algorithms >>>>locally >>>> without the need of a distributed cluster, and scale up and scale out >>>>the >>>> execution of these algorithms to distributed Apache Hadoop MapReduce >>>>or >>>> Apache Spark clusters. >>>> >>>> == Background == >>>> >>>> SystemML started as a research project in the IBM Almaden Research >>>>Center >>>> around 2007 aiming to enable data scientists to develop machine >>>>learning >>>> algorithms independent of data and cluster characteristics. >>>> >>>> == Rationale == >>>> >>>> SystemML enables the specification of machine learning algorithms >>>>using a >>>> declarative machine learning (DML) language. DML includes linear >>>>algebra >>>> primitives, statistical functions, and additional constructs. This >>>> high-level language significantly increases the productivity of data >>>> scientists as it provides (1) full flexibility in expressing custom >>>> analytics and (2) data independence from the underlying input formats >>>>and >>>> physical data representations. >>>> >>>> SystemML computations can be executed in a variety of different modes. >>>>It >>>> supports single node in-memory computations and large-scale >>>>distributed >>>> cluster computations. This allows the user to quickly prototype new >>>> algorithms in local environments but automatically scale to large data >>>> sizes as well without changing the algorithm implementation. >>>> >>>> Algorithms specified in DML are dynamically compiled and optimized >>>>based >>>> on data and cluster characteristics using rule-based and cost-based >>>> optimization techniques. The optimizer automatically generates hybrid >>>> runtime execution plans ranging from in-memory single-node execution >>>>to >>>> distributed computations on Apache Spark or Apache Hadoop MapReduce. >>>>This >>>> ensures both efficiency and scalability. Automatic optimization >>>>reduces or >>>> eliminates the need to hand-tune distributed runtime execution plans >>>>and >>>> system configurations. >>>> >>>> == Initial Goals == >>>> >>>> The initial goals to move SystemML to the Apache Incubator is to >>>>broaden >>>> the community foster the contributions from data scientists to develop >>>>new >>>> machine learning algorithms and enhance the existing ones. Ultimately, >>>>this >>>> may lead to the creation of an industry standard in specifying machine >>>> learning algorithms. >>>> >>>> == Current Status == >>>> >>>> The initial code has been developed at the IBM Almaden Research Center >>>>in >>>> California and has recently been made available in GitHub under the >>>>Apache >>>> Software License 2.0. The project currently supports a single node (in >>>> memory computation) as well as distributed computations utilizing >>>>Apache >>>> Hadoop MapReduce or Apache Spark clusters. >>>> >>>> === Meritocracy === >>>> >>>> We plan to invest in supporting a meritocracy. We will discuss the >>>> requirements in an open forum. Several companies have already >>>>expressed >>>> interest in this project, and we intend to invite additional >>>>developers to >>>> participate. We will encourage and monitor community participation so >>>>that >>>> privileges can be extended to those that contribute operating to the >>>> standard of meritocracy that Apache emphasizes. >>>> >>>> === Community === >>>> >>>> The need for a generic scalable and declarative machine learning >>>>approach >>>> in the open source is tremendous, so there is a potential for a very >>>>large >>>> community. We believe that SystemML¹s extensible architecture, >>>>declarative >>>> syntax, cost based optimizer and its alignment with Spark will further >>>> encourage community participation not only in enhancing the >>>>infrastructure >>>> but also speed up the creation of algorithms for a wide range of use >>>> cases. We expect that over time SystemML will attract a large >>>>community. >>>> >>>> === Alignment === >>>> >>>> The initial committers strongly believe that a generic scalable and >>>> declarative machine learning approach for machine learning will gain >>>> broader adoption as an open source, community driven project, where >>>>the >>>> community can contribute not only to the core components, but also to >>>>a >>>> growing collection of algorithms which will leverage the optimizations >>>>and >>>> ease of scaling in SystemML. Our hope is that the Apache Spark, Apache >>>> Hadoop and other communities will find tremendous value in SystemML >>>>and >>>> this will foster further collaboration between these projects >>>>furthering >>>> the already existing integration points. >>>> >>>> == Known Risks == >>>> >>>> To-date, development has been sponsored by IBM and coordinated mostly >>>>by >>>> the core team of researchers at the IBM Almaden Research Center. >>>> >>>> For SystemML to fully transition to an "Apache Way" governance model, >>>>it >>>> needs to start embracing the meritocracy-centric way of growing the >>>> community of contributors. >>>> >>>> === Orphaned Products === >>>> >>>> The SystemML developers and previous sponsor have a long-term interest >>>>in >>>> use and maintenance of the code and there is also hope that growing a >>>> diverse community around the project will become a guarantee against >>>>the >>>> project becoming orphaned. We feel that it is also important to put >>>>formal >>>> governance in place both for the project and the contributors as the >>>> project expands. We feel ASF is the best location for this. >>>> >>>> === Inexperience with Open Source === >>>> >>>> The current SystemML set of contributors are very diverse regarding >>>> participation in Open Source. While some initial members are >>>>experiencing >>>> an open source project for the first time, others have been >>>>contributing >>>> and mentoring various Apache and non-Apache open source projects. >>>> >>>> === Reliance on Salaried Developers === >>>> >>>> SystemML currently receives substantial support from salaried >>>>developers. >>>> However, they are all passionate about the project, and we are >>>>confident >>>> that the project will continue even if no salaried developers >>>>contribute to >>>> the project. We are committed to recruiting additional committers >>>>including >>>> non-salaried developers. >>>> >>>> >>>> === Relationships with Other Apache Products === >>>> >>>> Currently, SystemML integrates with Apache Hadoop MapReduce and Apache >>>> Spark as underlying computational distributed runtimes. >>>> >>>> === An Excessive Fascination with the Apache Brand === >>>> >>>> SystemML solves a real need for generic scalable and declarative >>>>machine >>>> learning approach for machine learning in the Apache Hadoop and Spark >>>> ecosystems, something that has been addressed in a very ad hoc manner >>>>so >>>> far by multiple Apache projects. Our rationale for developing SystemML >>>>as >>>> an Apache project is detailed in the Rationale section. We believe >>>>that the >>>> Apache brand and community process will help us attract more >>>>contributors >>>> to this project, and help establish ubiquitous APIs. >>>> >>>> >>>> == Documentation == >>>> >>>> Documentation regarding SystemML is available in the current GitHub >>>> repository >>>>https://github.com/SparkTC/systemml/tree/master/system-ml/docs. >>>> >>>> >>>> == Initial Source == >>>> >>>> Initial source is available on GitHub under the Apache License 2.0 >>>> >>>> https://github.com/SparkTC/systemml >>>> >>>> == Source and Intellectual Property Submission Plan == >>>> >>>> We know of no legal encumbrances in the transfer of source code and >>>>rights >>>> to Apache. In fact, given the internal IBM due diligence performed on >>>>the >>>> source code during open sourcing, we expect the code base to be free >>>>from >>>> any IP issues. >>>> >>>> == External Dependencies == >>>> >>>> SystemML is written in Java and currently supports Apache Hadoop >>>>MapReduce >>>> and Apache Spark runtimes. >>>> >>>> To the best of our knowledge, all dependencies of SystemML are >>>>distributed >>>> under Apache compatible licenses. Upon acceptance to the incubator, we >>>> would begin a thorough analysis of all transitive dependencies to >>>>verify >>>> this fact and introduce license checking into the build and release >>>>process >>>> (for instance integrating Apache Rat). >>>> >>>> Cryptography >>>> N/A >>>> >>>> == Required Resources == >>>> >>>> === Mailing lists === >>>> * priv...@sysml.incubator.apache.org (moderated subscriptions) >>>> * comm...@sysml.incubator.apache.org >>>> * d...@sysml.incubator.apache.org >>>> >>>> === Git Repository === >>>> * https://git-wip-us.apache.org/repos/asf/incubator-sysml.git >>>> >>>> === Issue Tracking === >>>> * JIRA (SYSML) >>>> >>>> == Initial Committers == >>>> >>>> * Luciano Resende (lresende AT apache DOT org) >>>> * Berthold Reinwald (reinwald AT us DOT ibm DOT com) >>>> * Matthias Boehm (mboehm AT us DOT ibm DOT com) >>>> * Shirish Tatikonda (statiko AT us DOT ibm DOT com) >>>> * Niketan Pansare (npansar AT us DOT ibm DOT com) >>>> * Prithviraj Sen (senp AT us DOT ibm DOT com) >>>> * Alexandre V Evfimievski (evfimi AT us DOT ibm DOT com) >>>> * Fred Reiss (frreiss AT us DOT ibm DOT com) >>>> * Deron Eriksson (deron AT us DOT ibm DOT com) >>>> * Arvind Surve (asurve AT us DOT ibm DOT com) >>>> * Mike Dusenberry (mwdusenb AT us DOT ibm DOT com) >>>> * Reynold Xin (rxin AT apache DOT org) >>>> * Xiangrui Meng (meng AT apache DOT org) >>>> * Joseph Bradley (jkbradley AT apache DOT org) >>>> * Patrick Wendell (pwendell AT apache DOT org) >>>> * Holden Karau (holden AT apache DOT org) >>>> * DB Tsai (dbtsai AT apache DOT org) >>>> >>>> == Affiliations == >>>> >>>> * DataBricks: Reynold Xin, Xiangrui Meng, Joseph Bradley, Patrick >>>>Wendell >>>> * Netflix: DB Tsai >>>> * IBM: Luciano Resende, Berthold Reinwald, Matthias Boehm, Shirish >>>> Tatikonda, Niketan Pansare, Prithviraj Sen, Alexandre V Evfimievski, >>>>Fred >>>> Reiss, Deron Eriksson, Arvind Surve, Mike Dusenberry and Holden Karau. >>>> >>>> == Sponsors == >>>> >>>> === Champion === >>>> * Luciano Resende >>>> >>>> === Nominated Mentors === >>>> * Luciano Resende >>>> * Reynold Xin >>>> * Patrick Wendell >>>> * Rich Bowen >>>> >>>> === Sponsoring Entity === >>>> We would like to propose the Apache Incubator to sponsor this project. >>> Off course, my +1 >>> >>> -- >>> Luciano Resende >>> http://people.apache.org/~lresende >>> http://twitter.com/lresende1975 >>> http://lresende.blogspot.com/ > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org >