Apache SystemML(incubating) is also a flexible, scalable machine learning system[4].
> Apache SystemML provides an optimal workplace for machine learning using big > data. It can be run on top of Apache Spark, where it automatically scales > your data, line by line, determining whether your code should be run on the > driver or an Apache Spark cluster. [4] https://systemml.apache.org/ Thanks, - Tsuyoshi On Fri, Jan 6, 2017 at 6:11 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote: > Hi Henri, > > It's a great news! Looking forwarding to MXNet's coming to Apache Incubator > :-) > > Two minor comments: > >> We currently use Github to maintain our source code, >> https://github.com/MXNet > > In my understanding, the following url is correct one. > https://github.com/dmlc/mxnet > >> === Relationship with Other Apache Products === > > As far as I know, there are 2 additional machine learning libraries in > addition to the projects you mentioned. > Apache MADlib(incubating)[1] is a machine learning library, which can > run on SQL system(Greenplum/Apache HAWQ(incubating)/PostgreSQL). > Apache Hivemall(incubating)[2] are also machine learning library, > which can run on Hadoop ecosystem: Apache Spark/Apache Hive/Apache > Pig. Especially for Hivemall project, it has MIX server, one kind of > parameter server to exchange parameters between mappers[3]. > > This is just a sharing, and I don't mean you should add these projects > to your comment. > > [1] http://madlib.incubator.apache.org/ > [2] https://hivemall.incubator.apache.org/ > [3] https://hivemall.incubator.apache.org/userguide/tips/mixserver.html > > Thanks, > - Tsuyoshi > > On Fri, Jan 6, 2017 at 4:32 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: >> This is great news and I am looking forward to it =) >> >> According to proposal, the community want to stick with Github issues for >> tracking issues and bugs? >> I suppose this needs a nod by Greg Stein as rep from Apache Infra to >> confirm that this is ok for incubation and how would it impact during >> graduation. >> >> - Henry >> >> On Thu, Jan 5, 2017 at 9:12 PM, Henri Yandell <bay...@apache.org> wrote: >> >>> Hello Incubator, >>> >>> I'd like to propose a new incubator Apache MXNet podling. >>> >>> The existing MXNet project (http://mxnet.io - 1.5 years old, 15 >>> committers, >>> 200 contributors) is very interested in joining Apache. MXNet is an >>> open-source deep learning framework that allows you to define, train, and >>> deploy deep neural networks on a wide array of devices, from cloud >>> infrastructure to mobile devices. >>> >>> The wiki proposal page is located here: >>> >>> https://wiki.apache.org/incubator/MXNetProposal >>> >>> I've included the text below in case anyone wants to focus on parts of it >>> in a reply. >>> >>> Looking forward to your thoughts, and for lots of interested Apache members >>> to volunteer to mentor the project in addition to Sebastian and myself. >>> >>> Currently the list of committers is based on the current active coders, so >>> we're also very interested in hearing from anyone else who is interested in >>> working on the project, be they current or future contributor! >>> >>> Thanks, >>> >>> Hen >>> On behalf of the MXNet project >>> >>> --------- >>> >>> = MXNet: Apache Incubator Proposal = >>> >>> == Abstract == >>> >>> MXNet is a Flexible and Efficient Library for Deep Learning >>> >>> == Proposal == >>> >>> MXNet is an open-source deep learning framework that allows you to define, >>> train, and deploy deep neural networks on a wide array of devices, from >>> cloud infrastructure to mobile devices. It is highly scalable, allowing for >>> fast model training, and supports a flexible programming model and multiple >>> languages. MXNet allows you to mix symbolic and imperative programming >>> flavors to maximize both efficiency and productivity. MXNet is built on a >>> dynamic dependency scheduler that automatically parallelizes both symbolic >>> and imperative operations on the fly. A graph optimization layer on top of >>> that makes symbolic execution fast and memory efficient. The MXNet library >>> is portable and lightweight, and it scales to multiple GPUs and multiple >>> machines. >>> >>> == Background == >>> >>> Deep learning is a subset of Machine learning and refers to a class of >>> algorithms that use a hierarchical approach with non-linearities to >>> discover and learn representations within data. Deep Learning has recently >>> become very popular due to its applicability and advancement of domains >>> such as Computer Vision, Speech Recognition, Natural Language Understanding >>> and Recommender Systems. With pervasive and cost effective cloud computing, >>> large labeled datasets and continued algorithmic innovation, Deep Learning >>> has become the one of the most popular classes of algorithms for machine >>> learning practitioners in recent years. >>> >>> == Rational == >>> >>> The adoption of deep learning is quickly expanding from initial deep domain >>> experts rooted in academia to data scientists and developers working to >>> deploy intelligent services and products. Deep learning however has many >>> challenges. These include model training time (which can take days to >>> weeks), programmability (not everyone writes Python or C++ and like >>> symbolic programming) and balancing production readiness (support for >>> things like failover) with development flexibility (ability to program >>> different ways, support for new operators and model types) and speed of >>> execution (fast and scalable model training). Other frameworks excel on >>> some but not all of these aspects. >>> >>> >>> == Initial Goals == >>> >>> MXNet is a fairly established project on GitHub with its first code >>> contribution in April 2015 and roughly 200 contributors. It is used by >>> several large companies and some of the top research institutions on the >>> planet. Initial goals would be the following: >>> >>> 1. Move the existing codebase(s) to Apache >>> 1. Integrate with the Apache development process/sign CLAs >>> 1. Ensure all dependencies are compliant with Apache License version 2.0 >>> 1. Incremental development and releases per Apache guidelines >>> 1. Establish engineering discipline and a predictable release cadence of >>> high quality releases >>> 1. Expand the community beyond the current base of expert level users >>> 1. Improve usability and the overall developer/user experience >>> 1. Add additional functionality to address newer problem types and >>> algorithms >>> >>> >>> == Current Status == >>> >>> === Meritocracy === >>> >>> The MXNet project already operates on meritocratic principles. Today, MXNet >>> has developers worldwide and has accepted multiple major patches from a >>> diverse set of contributors within both industry and academia. We would >>> like to follow ASF meritocratic principles to encourage more developers to >>> contribute in this project. We know that only active and committed >>> developers from a diverse set of backgrounds can make MXNet a successful >>> project. We are also improving the documentation and code to help new >>> developers get started quickly. >>> >>> === Community === >>> >>> Acceptance into the Apache foundation would bolster the growing user and >>> developer community around MXNet. That community includes around 200 >>> contributors from academia and industry. The core developers of our project >>> are listed in our contributors below and are also represented by logos on >>> the mxnet.io site including Amazon, Baidu, Carnegie Mellon University, >>> Turi, Intel, NYU, Nvidia, MIT, Microsoft, TuSimple, University of Alberta, >>> University of Washington and Wolfram. >>> >>> === Core Developers === >>> >>> (with GitHub logins) >>> >>> * Tianqi Chen (@tqchen) >>> * Mu Li (@mli) >>> * Junyuan Xie (@piiswrong) >>> * Bing Xu (@antinucleon) >>> * Chiyuan Zhang (@pluskid) >>> * Minjie Wang (@jermainewang) >>> * Naiyan Wang (@winstywang) >>> * Yizhi Liu (@javelinjs) >>> * Tong He (@hetong007) >>> * Qiang Kou (@thirdwing) >>> * Xingjian Shi (@sxjscience) >>> >>> === Alignment === >>> >>> ASF is already the home of many distributed platforms, e.g., Hadoop, Spark >>> and Mahout, each of which targets a different application domain. MXNet, >>> being a distributed platform for large-scale deep learning, focuses on >>> another important domain for which there still lacks a scalable, >>> programmable, flexible and super fast open-source platform. The recent >>> success of deep learning models especially for vision and speech >>> recognition tasks has generated interests in both applying existing deep >>> learning models and in developing new ones. Thus, an open-source platform >>> for deep learning backed by some of the top industry and academic players >>> will be able to attract a large community of users and developers. MXNet is >>> a complex system needing many iterations of design, implementation and >>> testing. Apache's collaboration framework which encourages active >>> contribution from developers will inevitably help improve the quality of >>> the system, as shown in the success of Hadoop, Spark, etc. Equally >>> important is the community of users which helps identify real-life >>> applications of deep learning, and helps to evaluate the system's >>> performance and ease-of-use. We hope to leverage ASF for coordinating and >>> promoting both communities, and in return benefit the communities with >>> another useful tool. >>> >>> == Known Risks == >>> >>> === Orphaned products === >>> >>> Given the current level of investment in MXNet and the stakeholders using >>> it - the risk of the project being abandoned is minimal. Amazon, for >>> example, is in active development to use MXNet in many of its services and >>> many large corporations use it in their production applications. >>> >>> === Inexperience with Open Source === >>> >>> MXNet has existed as a healthy open source project for more than a year. >>> During that time, the project has attracted 200+ contributors. >>> >>> === Homogenous Developers === >>> >>> The initial list of committers and contributors includes developers from >>> several institutions and industry participants (see above). >>> >>> === Reliance on Salaried Developers === >>> >>> Like most open source projects, MXNet receives a substantial support from >>> salaried developers. A large fraction of MXNet development is supported by >>> graduate students at various universities in the course of research degrees >>> - this is more a “volunteer” relationship, since in most cases students >>> contribute vastly more than is necessary to immediately support research. >>> In addition, those working from within corporations are devoting >>> significant time and effort in the project - and these come from several >>> organizations. >>> >>> === A Excessive Fascination with the Apache Brand === >>> >>> We choose Apache not for publicity. We have two purposes. First, we hope >>> that Apache's known best-practices for managing a mature open source >>> project can help guide us. For example, we are feeling the growing pains >>> of a successful open source project as we attempt a major refactor of the >>> internals while customers are using the system in production. We seek >>> guidance in communicating breaking API changes and version revisions. >>> Also, as our involvement from major corporations increases, we want to >>> assure our users that MXNet will stay open and not favor any particular >>> platform or environment. These are some examples of the know-how and >>> discipline we're hoping Apache can bring to our project. >>> >>> Second, we want to leverage Apache's reputation to recruit more developers >>> to create a diverse community. >>> >>> === Relationship with Other Apache Products === >>> >>> Apache Mahout and Apache Spark's MLlib are general machine learning >>> systems. Deep learning algorithms can thus be implemented on these two >>> platforms as well. However, in practice, the overlap will be minimal. Deep >>> learning is so computationally intensive that it often requires specialized >>> GPU hardware to accomplish tasks of meaningful size. Making efficient use >>> of GPU hardware is complex because the hardware is so fast that the >>> supporting systems around it must be carefully optimized to keep the GPU >>> cores busy. Extending this capability to distributed multi-GPU and >>> multi-host environments requires great care. This is a critical >>> differentiator between MXNet and existing Apache machine learning systems. >>> >>> Mahout and Spark ML-LIB follow models where their nodes run synchronously. >>> This is the fundamental difference to MXNet who follows the parameter >>> server framework. MXNet can run synchronously or asynchronously. In >>> addition, MXNet has optimizations for training a wide range of deep >>> learning models using a variety of approaches (e.g., model parallelism and >>> data parallelism) which makes MXNet much more efficient (near-linear >>> speedup on state of the art models). MXNet also supports both imperative >>> and symbolic approaches providing ease of programming for deep learning >>> algorithms. >>> >>> Other Apache projects that are potentially complimentary: >>> >>> Apache Arrow - read data in Apache Arrow‘s internal format from MXNet, that >>> would allow users to run ETL/preprocessing in Spark, save the results in >>> Arrow’s format and then run DL algorithms on it. >>> >>> Apache Singa - MXNet and Singa are both deep learning projects, and can >>> benefit from a larger deep learning community at Apache. >>> >>> == Documentation == >>> >>> Documentation has recently migrated to http://mxnet.io. We continue to >>> refine and improve the documentation. >>> >>> == Initial Source == >>> >>> We currently use Github to maintain our source code, >>> https://github.com/MXNet >>> >>> == Source and Intellectual Property Submission Plan == >>> >>> MXNet Code is available under Apache License, Version 2.0. We will work >>> with the committers to get CLAs signed and review previous contributions. >>> >>> == External Dependencies == >>> >>> * required by the core code base: GCC or CLOM, Clang, any BLAS library >>> (ATLAS, OpenBLAS, MKL), dmlc-core, mshadow, ps-lite (which requires >>> lib-zeromq), TBB >>> * required for GPU usage: cudnn, cuda >>> * required for python usage: Python 2/3 >>> * required for R module: R, Rcpp (GPLv2 licensing) >>> * optional for image preparation and preprocessing: opencv >>> * optional dependencies for additional features: torch7, numba, cython (in >>> NNVM branch) >>> >>> Rcpt and lib-zeromq are expected to be licensing discussions. >>> >>> == Cryptography == >>> >>> Not Applicable >>> >>> == Required Resources == >>> >>> === Mailing Lists === >>> >>> There is currently no mailing list. >>> >>> === Issue Tracking === >>> >>> Currently uses GitHub to track issues. Would like to continue to do so. >>> >>> == Committers and Affiliations == >>> >>> * Tianqi Chen (UW) >>> * Mu Li (AWS) >>> * Junyuan Xie (AWS) >>> * Bing Xu (Apple) >>> * Chiyuan Zhang (MIT) >>> * Minjie Wang (UYU) >>> * Naiyan Wang (Tusimple) >>> * Yizhi Liu (Mediav) >>> * Tong He (Simon Fraser University) >>> * Qiang Kou (Indiana U) >>> * Xingjian Shi (HKUST) >>> >>> == Sponsors == >>> >>> === Champion === >>> >>> Henri Yandell (bayard at apache.org) >>> >>> === Nominated Mentors === >>> >>> Sebastian Schelter (s...@apache.org) >>> >>> >>> === Sponsoring Entity === >>> >>> We are requesting the Incubator to sponsor this project. >>> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org