Thanks for raising this issue. I agree that committer diversity is important for long term success of a project. I think that should be a criteria for graduation from incubator. I think it is going to be more easier to find new contributors as an Apache incubator project.
On Tue, Mar 10, 2015 at 9:09 AM, jan i <j...@apache.org> wrote: > > +0 I am really concerned about the diversity of the initial committers, > what happens if the university pulls the plug. I know we all say it will > never happen, but it could happen. > > rgds > jan i. > > > On 10 March 2015 at 16:20, Alan Gates <alanfga...@gmail.com> wrote: > >> +1 >> >> Alan. >> >> Thejas Nair <thejas.n...@gmail.com> >> March 10, 2015 at 7:33 >> The Singa Incubator Proposal document has been updated based on >> feedback in the proposal thread. >> >> This vote is proposing the inclusion of Apache Singa as incubator project. >> The vote will run for at least 72 hours. >> >> [ ] +1 Accept Apache Singa into the Incubator >> [ ] +0 Don’t care. >> [ ] -1 Don’t accept Apache Singa into the Incubator because.. >> >> Please vote ! >> >> Here is my +1 . >> >> Link to version of proposal being voted on : >> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10 >> >> The text is below >> ---------------------------------------------- >> >> = Singa Incubator Proposal = >> == Abstract == >> SINGA is a distributed deep learning platform. >> >> == Proposal == >> SINGA is an efficient, scalable and easy-to-use distributed platform >> for training deep learning models, e.g., Deep Convolutional Neural >> Network and >> Deep Belief Network. It parallelizes the computation (i.e., training) >> onto a >> cluster of nodes by distributing the training data and model >> automatically to >> speed up the training. Built-in training algorithms like Back-Propagation >> and >> Contrastive Divergence are implemented based on common abstractions of >> deep >> learning models. Users can train their own deep learning models by simply >> customizing these abstractions like implementing the Mapper and >> Reducer in Hadoop. >> >> == Background == >> Deep learning refers to a set of feature (or representation) learning >> models >> that consist of multiple (non-linear) layers, where different layers learn >> different levels of abstractions (representations) of the raw input data. >> Larger (in terms of model parameters) and deeper (in terms of number of >> layers) >> models have shown better performance, e.g., lower image classification >> error in >> Large Scale Visual Recognition Challenge. However, a larger model >> requires more >> memory and larger training data to reduce over-fitting. Complex >> numeric operations >> make the training computation intensive. In practice, training large >> deep learning >> models takes weeks or months on a single node (even with GPU). >> >> == Rational == >> Deep learning has gained a lot of attraction in both academia and >> industry due to >> its success in a wide range of areas such as computer vision and >> speech recognition. >> However, training of such models is computationally expensive, >> especially for large >> and deep models (e.g., with billions of parameters and more than 10 >> layers). Both >> Google and Microsoft have developed distributed deep learning systems >> to make the >> training more efficient by distributing the computations within a >> cluster of nodes. >> However, these systems are closed source softwares. Our goal is to >> leverage the >> community of open source developers to make SINGA efficient, scalable >> and easy to >> use. SINGA is a full fledged distributed platform, that could benefit the >> community and also benefit from the community in their involvement in >> contributing >> to the further work in this area. We believe the nature of SINGA and our >> visions >> for the system fit naturally to Apache's philosophy and development >> framework. >> >> == Initial Goals == >> We have developed a system for SINGA running on a commodity computer >> cluster. The initial goals include, >> * improving the system in terms of scalability and efficiency, e.g., >> using Infiniband for network communication and multi-threading for one >> node computation. We would consider extending SINGA to GPU clusters >> later. >> * benchmarking with larger datasets (hundreds of millions of training >> instances) and models (billions of parameters). >> * adding more built-in deep learning models. Users can train the >> built-in models on their datasets directly. >> >> >> == Current Status == >> === Meritocracy === >> We would like to follow ASF meritocratic principles to encourage more >> developers >> to contribute in this project. We know that only active and excellent >> developers >> can make SINGA a successful project. The committer list and PMC will be >> updated >> based on developers' performance and commitment. We are also improving the >> documentation and code to help new developers get started quickly. >> >> === Community === >> SINGA is currently being developed in the Database System Research Lab at >> the >> National University of Singapore (NUS) in collaboration with Zhejiang >> University in China. >> Our lab has extensive experience in building database related systems, >> including >> distributed systems. Six PhD students and research assistants (Jinyang >> Gao, >> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a >> research >> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian >> Lee Tan) >> have been working for a year on this project. We are open to recruiting >> more >> developers from diverse backgrounds. >> >> === Core Developers === >> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked >> on >> distributed systems for more than 20 years. They have collaborated with >> the >> industry and have built various large scale systems. Anh Dinh's research >> is also >> on distributed systems, albeit with more focus on security aspects. Wei >> Wang's >> research is on deep learning problems including deep learning >> applications and >> large scale training. Sheng Wang and Jinyang are working on efficient >> indexing, >> querying of large scale data and machine learning. Kaiping, Zhaojing and >> Zhongle >> are new PhD students who jointed SINGA recently. They will work on this >> project >> for a longer time (next 4-5 years). While we share common research >> interests, >> each member also brings diverse expertise to the team. >> >> === Alignment === >> ASF is already the home of many distributed platforms, e.g., Hadoop, >> Spark and >> Mahout, each of which targets a different application domain. SINGA, >> being a >> distributed platform for large-scale deep learning, focuses on another >> important >> domain for which there still lacks a robust and scalable open-source >> platform. >> The recent success of deep learning models especially for vision and >> speech >> recognition tasks has generated interests in both applying existing >> deep learning >> models and in developing new ones. Thus, an open-source platform for deep >> learning will be able to attract a large community of users and >> developers. >> SINGA is a complex system needing many iterations of design, >> implementation and >> testing. Apache's collaboration framework which encourages active >> contribution >> from developers will inevitably help improve the quality of the system, >> as shown >> in the success of Hadoop, Spark, etc.. Equally important is the community >> of >> users which helps identify real-life applications of deep learning, and >> helps >> to evaluate the system's performance and ease-of-use. We hope to >> leverage ASF for >> coordinating and promoting both communities, and in return benefit the >> communities >> with another useful tool. >> >> == Known Risks == >> === Orphaned products === >> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the >> lab in two to four years time. It is possible that some of them may >> not have enough >> time to focus on this project after that. But, SINGA is part of our other >> bigger >> research projects on building an infrastructure for data intensive >> applications, >> which include health-care analytics and brain-inspired computing. Beng >> Chin and >> Kian Lee would continue working on it and getting more people >> involved. For example, >> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently. >> Individual developers are welcome to make SINGA a diverse community >> that is robust and independent from any single developer. >> >> === Inexperience with Open Source === >> All the developers are active users and followers of open source >> projects. Our >> research lab has a strong commitment to open source, and has released the >> source >> code of several systems under open source license as a way of >> contributing back >> to the open source community. But we do not have much real experience >> in open source >> projects with large and well organized communities like those in Apache. >> This is >> one reason we choose Apache which is experienced in open source >> project incubation. >> We hope to get the help from Apache (e.g., champion and mentors) to >> establish a >> healthy path for SINGA. >> >> === Homogenous Developers === >> Although the current developers are researchers in the universities, they >> have >> different research interests and project experiences, as mentioned in >> the section >> that introduces the core developers. We know that a diverse community >> is helpful. >> Hence we are open to the idea of recruiting developers from other >> regions and organizations. >> >> === Reliance on Salaried Developers === >> As a research project in the university, SINGA's current developing >> community >> consists of professors, PhD students, research assistants and >> postdoctoral fellows. >> They are driven by their interests to work on this project and have >> contributed >> actively since the start of the project. The research assistants and >> fellows are >> expected to leave when their contracts expire. However, they are keen >> to continue >> to work on the project voluntarily. Moreover, as a long term research >> project, new >> research assistants and fellows are likely to join the project. >> >> === A Excessive Fascination with the Apache Brand === >> We choose Apache not for publicity. We have two purposes. First, we want >> to >> leverage Apache's reputation to recruit more developers to make a diverse >> community. Second, we hope that Apache can help us to establish a healthy >> path >> in developing SINGA. Beng Chin and Kian-Lee are established database and >> distributed system researchers, and together with the other contributors, >> they >> sincerely believe that there is a need for a widely accepted open source >> distributed deep learning platform. The field of deep learning is still >> at its >> infancy, and an open source platform will fuel the research in the >> area. Moreover, >> such a platform will enable researchers to develop new models and >> algorithms, >> rather than spending time implementing a deep learning system from >> scratch. >> Furthermore, the need for scalability for such a platform is obvious. >> >> === Relationship with Other Apache Products === >> Apache Mahout and Apache Spark's ML-LIB are general machine learning >> systems. Deep >> learning algorithm can thus be implemented on these two platforms as >> well. However, the there are differences in training efficiency, >> scalability and >> usability. Mahout and Spark ML-LIB follow models where their >> nodes run synchronously. This is the fundamental difference to Singa who >> follows the parameter server framework (like Google Brain and Microsoft >> Adam). Singa can run synchronously or asynchronously. The asynchronous >> mode >> is superior than the synchronous mode in terms of scalability. In >> addition, Singa has some optimizations towards deep learning models >> (e.g., model >> parallelism, data parallelism and hybrid-parallelism) which make Singa >> more efficient. We also provide ease of use programming model for deep >> learning algorithms. >> >> There are also plans for integration with Apache Hadoop's HDFS as >> storage, to handle large training data. >> Specifically, we store the training data (e.g., images or raw features of >> images) in HDFS, then (pre-)fetch them online. >> We will also explore integration with Hadoop's Yarn and Apache Mesos >> to do resource management. >> >> >> == Documentation == >> The project is hosted at >> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html. >> Documentations can be found at the Github Wiki Page: >> https://github.com/nusinga/singa/wiki. >> We continue to refine and improve the documentation. >> >> == Initial Source == >> We use Github to maintain our source code, >> https://github.com/nusinga/singa >> >> == Source and Intellectual Property Submission Plan == >> We plan to make our code base be under Apache License, Version 2.0. >> >> == External Dependencies == >> * required by the core code base: glog, gflags, google protobuf, >> open-blas, mpich, armci-mpi. >> * required by data preparation and preprocessing: opencv, hdfs, python. >> >> == Cryptography == >> Not Applicable >> >> == Required Resources == >> === Mailing Lists === >> Currently, we use google group for internal discussion. The mailing >> address is >> nusi...@googlegroup.com. We will migrate the content to the apache >> mailing >> lists in the future. >> >> * singa-dev >> * singa-user >> * singa-commits >> * singa-private (for private discussion within PCM) >> >> === Git Repository === >> We want to continue using git for version control. Hence, a git repo >> is required. >> >> === Issue Tracking === >> JIRA Singa (SINGA) >> >> == Initial Committers == >> * Beng Chin Ooi (ooibc @comp.nus.edu.sg) >> * Kian Lee Tan (tankl @comp.nus.edu.sg) >> * Gang Chen (cg @zju.edu.cn) >> * Wei Wang (wangwei @comp.nus.edu.sg) >> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg) >> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg) >> * Sheng Wang (wangsh @comp.nus.edu.sg) >> * Kaiping Zheng (kaiping @comp.nus.edu.sg) >> * Zhaojing Luo (zhaojing @comp.nus.edu.sg) >> * Zhongle Xie (zhongle @comp.nus.edu.sg) >> >> == Affiliations == >> * Beng Chin Ooi, National University of Singapore >> * Kian Lee Tan, National University of Singapore >> * Gang Chen, Zhejiang University >> * Wei Wang, National University of Singapore >> * Dinh Tien Tuan Anh, National University of Singapore >> * Jinyang Gao, National University of Singapore >> * Sheng Wang, National University of Singapore >> * Kaiping Zheng, National University of Singapore >> * Zhaojing Luo, National University of Singapore >> * Zhongle Xie, National University of Singapore >> >> == Sponsors == >> === Champion === >> Thejas Nair (thejas at apache.org) >> >> === Nominated Mentors === >> * Thejas Nair (thejas at apache.org) >> * Alan Gates (gates at apache dot org) >> * Daniel Dai (daijy at apache dot org) >> * Ted Dunning (tdunning at apache dot org) >> >> === Sponsoring Entity === >> We are requesting the Incubator to sponsor this project. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> >