+1 The diversity should be closely eximined if by the graduation time the situation hasn't improved.
Cos On Tue, Mar 10, 2015 at 12:17PM, Ted Dunning wrote: > +1 > > I am not nearly as worried about the committer diversity, certainly not > relative to entry into incubator. This is a great project that has already > shown some very strong willingness to work with others in the short time I > have interacted with them. > > > On Tue, Mar 10, 2015 at 11:49 AM, Thejas Nair <thejas.n...@gmail.com> wrote: > > > Thanks for raising this issue. I agree that committer diversity is > > important for long term success of a project. I think that should be a > > criteria for graduation from incubator. > > I think it is going to be more easier to find new contributors as an Apache > > incubator project. > > > > > > On Tue, Mar 10, 2015 at 9:09 AM, jan i <j...@apache.org> wrote: > > > > > > > > +0 I am really concerned about the diversity of the initial committers, > > > what happens if the university pulls the plug. I know we all say it will > > > never happen, but it could happen. > > > > > > rgds > > > jan i. > > > > > > > > > On 10 March 2015 at 16:20, Alan Gates <alanfga...@gmail.com> wrote: > > > > > >> +1 > > >> > > >> Alan. > > >> > > >> Thejas Nair <thejas.n...@gmail.com> > > >> March 10, 2015 at 7:33 > > >> The Singa Incubator Proposal document has been updated based on > > >> feedback in the proposal thread. > > >> > > >> This vote is proposing the inclusion of Apache Singa as incubator > > project. > > >> The vote will run for at least 72 hours. > > >> > > >> [ ] +1 Accept Apache Singa into the Incubator > > >> [ ] +0 Don’t care. > > >> [ ] -1 Don’t accept Apache Singa into the Incubator because.. > > >> > > >> Please vote ! > > >> > > >> Here is my +1 . > > >> > > >> Link to version of proposal being voted on : > > >> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10 > > >> > > >> The text is below > > >> ---------------------------------------------- > > >> > > >> = Singa Incubator Proposal = > > >> == Abstract == > > >> SINGA is a distributed deep learning platform. > > >> > > >> == Proposal == > > >> SINGA is an efficient, scalable and easy-to-use distributed platform > > >> for training deep learning models, e.g., Deep Convolutional Neural > > >> Network and > > >> Deep Belief Network. It parallelizes the computation (i.e., training) > > >> onto a > > >> cluster of nodes by distributing the training data and model > > >> automatically to > > >> speed up the training. Built-in training algorithms like > > Back-Propagation > > >> and > > >> Contrastive Divergence are implemented based on common abstractions of > > >> deep > > >> learning models. Users can train their own deep learning models by > > simply > > >> customizing these abstractions like implementing the Mapper and > > >> Reducer in Hadoop. > > >> > > >> == Background == > > >> Deep learning refers to a set of feature (or representation) learning > > >> models > > >> that consist of multiple (non-linear) layers, where different layers > > learn > > >> different levels of abstractions (representations) of the raw input > > data. > > >> Larger (in terms of model parameters) and deeper (in terms of number of > > >> layers) > > >> models have shown better performance, e.g., lower image classification > > >> error in > > >> Large Scale Visual Recognition Challenge. However, a larger model > > >> requires more > > >> memory and larger training data to reduce over-fitting. Complex > > >> numeric operations > > >> make the training computation intensive. In practice, training large > > >> deep learning > > >> models takes weeks or months on a single node (even with GPU). > > >> > > >> == Rational == > > >> Deep learning has gained a lot of attraction in both academia and > > >> industry due to > > >> its success in a wide range of areas such as computer vision and > > >> speech recognition. > > >> However, training of such models is computationally expensive, > > >> especially for large > > >> and deep models (e.g., with billions of parameters and more than 10 > > >> layers). Both > > >> Google and Microsoft have developed distributed deep learning systems > > >> to make the > > >> training more efficient by distributing the computations within a > > >> cluster of nodes. > > >> However, these systems are closed source softwares. Our goal is to > > >> leverage the > > >> community of open source developers to make SINGA efficient, scalable > > >> and easy to > > >> use. SINGA is a full fledged distributed platform, that could benefit > > the > > >> community and also benefit from the community in their involvement in > > >> contributing > > >> to the further work in this area. We believe the nature of SINGA and our > > >> visions > > >> for the system fit naturally to Apache's philosophy and development > > >> framework. > > >> > > >> == Initial Goals == > > >> We have developed a system for SINGA running on a commodity computer > > >> cluster. The initial goals include, > > >> * improving the system in terms of scalability and efficiency, e.g., > > >> using Infiniband for network communication and multi-threading for one > > >> node computation. We would consider extending SINGA to GPU clusters > > >> later. > > >> * benchmarking with larger datasets (hundreds of millions of training > > >> instances) and models (billions of parameters). > > >> * adding more built-in deep learning models. Users can train the > > >> built-in models on their datasets directly. > > >> > > >> > > >> == Current Status == > > >> === Meritocracy === > > >> We would like to follow ASF meritocratic principles to encourage more > > >> developers > > >> to contribute in this project. We know that only active and excellent > > >> developers > > >> can make SINGA a successful project. The committer list and PMC will be > > >> updated > > >> based on developers' performance and commitment. We are also improving > > the > > >> documentation and code to help new developers get started quickly. > > >> > > >> === Community === > > >> SINGA is currently being developed in the Database System Research Lab > > at > > >> the > > >> National University of Singapore (NUS) in collaboration with Zhejiang > > >> University in China. > > >> Our lab has extensive experience in building database related systems, > > >> including > > >> distributed systems. Six PhD students and research assistants (Jinyang > > >> Gao, > > >> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a > > >> research > > >> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian > > >> Lee Tan) > > >> have been working for a year on this project. We are open to recruiting > > >> more > > >> developers from diverse backgrounds. > > >> > > >> === Core Developers === > > >> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked > > >> on > > >> distributed systems for more than 20 years. They have collaborated with > > >> the > > >> industry and have built various large scale systems. Anh Dinh's research > > >> is also > > >> on distributed systems, albeit with more focus on security aspects. Wei > > >> Wang's > > >> research is on deep learning problems including deep learning > > >> applications and > > >> large scale training. Sheng Wang and Jinyang are working on efficient > > >> indexing, > > >> querying of large scale data and machine learning. Kaiping, Zhaojing and > > >> Zhongle > > >> are new PhD students who jointed SINGA recently. They will work on this > > >> project > > >> for a longer time (next 4-5 years). While we share common research > > >> interests, > > >> each member also brings diverse expertise to the team. > > >> > > >> === Alignment === > > >> ASF is already the home of many distributed platforms, e.g., Hadoop, > > >> Spark and > > >> Mahout, each of which targets a different application domain. SINGA, > > >> being a > > >> distributed platform for large-scale deep learning, focuses on another > > >> important > > >> domain for which there still lacks a robust and scalable open-source > > >> platform. > > >> The recent success of deep learning models especially for vision and > > >> speech > > >> recognition tasks has generated interests in both applying existing > > >> deep learning > > >> models and in developing new ones. Thus, an open-source platform for > > deep > > >> learning will be able to attract a large community of users and > > >> developers. > > >> SINGA is a complex system needing many iterations of design, > > >> implementation and > > >> testing. Apache's collaboration framework which encourages active > > >> contribution > > >> from developers will inevitably help improve the quality of the system, > > >> as shown > > >> in the success of Hadoop, Spark, etc.. Equally important is the > > community > > >> of > > >> users which helps identify real-life applications of deep learning, and > > >> helps > > >> to evaluate the system's performance and ease-of-use. We hope to > > >> leverage ASF for > > >> coordinating and promoting both communities, and in return benefit the > > >> communities > > >> with another useful tool. > > >> > > >> == Known Risks == > > >> === Orphaned products === > > >> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave > > the > > >> lab in two to four years time. It is possible that some of them may > > >> not have enough > > >> time to focus on this project after that. But, SINGA is part of our > > other > > >> bigger > > >> research projects on building an infrastructure for data intensive > > >> applications, > > >> which include health-care analytics and brain-inspired computing. Beng > > >> Chin and > > >> Kian Lee would continue working on it and getting more people > > >> involved. For example, > > >> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently. > > >> Individual developers are welcome to make SINGA a diverse community > > >> that is robust and independent from any single developer. > > >> > > >> === Inexperience with Open Source === > > >> All the developers are active users and followers of open source > > >> projects. Our > > >> research lab has a strong commitment to open source, and has released > > the > > >> source > > >> code of several systems under open source license as a way of > > >> contributing back > > >> to the open source community. But we do not have much real experience > > >> in open source > > >> projects with large and well organized communities like those in Apache. > > >> This is > > >> one reason we choose Apache which is experienced in open source > > >> project incubation. > > >> We hope to get the help from Apache (e.g., champion and mentors) to > > >> establish a > > >> healthy path for SINGA. > > >> > > >> === Homogenous Developers === > > >> Although the current developers are researchers in the universities, > > they > > >> have > > >> different research interests and project experiences, as mentioned in > > >> the section > > >> that introduces the core developers. We know that a diverse community > > >> is helpful. > > >> Hence we are open to the idea of recruiting developers from other > > >> regions and organizations. > > >> > > >> === Reliance on Salaried Developers === > > >> As a research project in the university, SINGA's current developing > > >> community > > >> consists of professors, PhD students, research assistants and > > >> postdoctoral fellows. > > >> They are driven by their interests to work on this project and have > > >> contributed > > >> actively since the start of the project. The research assistants and > > >> fellows are > > >> expected to leave when their contracts expire. However, they are keen > > >> to continue > > >> to work on the project voluntarily. Moreover, as a long term research > > >> project, new > > >> research assistants and fellows are likely to join the project. > > >> > > >> === A Excessive Fascination with the Apache Brand === > > >> We choose Apache not for publicity. We have two purposes. First, we want > > >> to > > >> leverage Apache's reputation to recruit more developers to make a > > diverse > > >> community. Second, we hope that Apache can help us to establish a > > healthy > > >> path > > >> in developing SINGA. Beng Chin and Kian-Lee are established database and > > >> distributed system researchers, and together with the other > > contributors, > > >> they > > >> sincerely believe that there is a need for a widely accepted open source > > >> distributed deep learning platform. The field of deep learning is still > > >> at its > > >> infancy, and an open source platform will fuel the research in the > > >> area. Moreover, > > >> such a platform will enable researchers to develop new models and > > >> algorithms, > > >> rather than spending time implementing a deep learning system from > > >> scratch. > > >> Furthermore, the need for scalability for such a platform is obvious. > > >> > > >> === Relationship with Other Apache Products === > > >> Apache Mahout and Apache Spark's ML-LIB are general machine learning > > >> systems. Deep > > >> learning algorithm can thus be implemented on these two platforms as > > >> well. However, the there are differences in training efficiency, > > >> scalability and > > >> usability. Mahout and Spark ML-LIB follow models where their > > >> nodes run synchronously. This is the fundamental difference to Singa who > > >> follows the parameter server framework (like Google Brain and Microsoft > > >> Adam). Singa can run synchronously or asynchronously. The asynchronous > > >> mode > > >> is superior than the synchronous mode in terms of scalability. In > > >> addition, Singa has some optimizations towards deep learning models > > >> (e.g., model > > >> parallelism, data parallelism and hybrid-parallelism) which make Singa > > >> more efficient. We also provide ease of use programming model for deep > > >> learning algorithms. > > >> > > >> There are also plans for integration with Apache Hadoop's HDFS as > > >> storage, to handle large training data. > > >> Specifically, we store the training data (e.g., images or raw features > > of > > >> images) in HDFS, then (pre-)fetch them online. > > >> We will also explore integration with Hadoop's Yarn and Apache Mesos > > >> to do resource management. > > >> > > >> > > >> == Documentation == > > >> The project is hosted at > > >> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html. > > >> Documentations can be found at the Github Wiki Page: > > >> https://github.com/nusinga/singa/wiki. > > >> We continue to refine and improve the documentation. > > >> > > >> == Initial Source == > > >> We use Github to maintain our source code, > > >> https://github.com/nusinga/singa > > >> > > >> == Source and Intellectual Property Submission Plan == > > >> We plan to make our code base be under Apache License, Version 2.0. > > >> > > >> == External Dependencies == > > >> * required by the core code base: glog, gflags, google protobuf, > > >> open-blas, mpich, armci-mpi. > > >> * required by data preparation and preprocessing: opencv, hdfs, python. > > >> > > >> == Cryptography == > > >> Not Applicable > > >> > > >> == Required Resources == > > >> === Mailing Lists === > > >> Currently, we use google group for internal discussion. The mailing > > >> address is > > >> nusi...@googlegroup.com. We will migrate the content to the apache > > >> mailing > > >> lists in the future. > > >> > > >> * singa-dev > > >> * singa-user > > >> * singa-commits > > >> * singa-private (for private discussion within PCM) > > >> > > >> === Git Repository === > > >> We want to continue using git for version control. Hence, a git repo > > >> is required. > > >> > > >> === Issue Tracking === > > >> JIRA Singa (SINGA) > > >> > > >> == Initial Committers == > > >> * Beng Chin Ooi (ooibc @comp.nus.edu.sg) > > >> * Kian Lee Tan (tankl @comp.nus.edu.sg) > > >> * Gang Chen (cg @zju.edu.cn) > > >> * Wei Wang (wangwei @comp.nus.edu.sg) > > >> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg) > > >> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg) > > >> * Sheng Wang (wangsh @comp.nus.edu.sg) > > >> * Kaiping Zheng (kaiping @comp.nus.edu.sg) > > >> * Zhaojing Luo (zhaojing @comp.nus.edu.sg) > > >> * Zhongle Xie (zhongle @comp.nus.edu.sg) > > >> > > >> == Affiliations == > > >> * Beng Chin Ooi, National University of Singapore > > >> * Kian Lee Tan, National University of Singapore > > >> * Gang Chen, Zhejiang University > > >> * Wei Wang, National University of Singapore > > >> * Dinh Tien Tuan Anh, National University of Singapore > > >> * Jinyang Gao, National University of Singapore > > >> * Sheng Wang, National University of Singapore > > >> * Kaiping Zheng, National University of Singapore > > >> * Zhaojing Luo, National University of Singapore > > >> * Zhongle Xie, National University of Singapore > > >> > > >> == Sponsors == > > >> === Champion === > > >> Thejas Nair (thejas at apache.org) > > >> > > >> === Nominated Mentors === > > >> * Thejas Nair (thejas at apache.org) > > >> * Alan Gates (gates at apache dot org) > > >> * Daniel Dai (daijy at apache dot org) > > >> * Ted Dunning (tdunning at apache dot org) > > >> > > >> === Sponsoring Entity === > > >> We are requesting the Incubator to sponsor this project. > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > >> For additional commands, e-mail: general-h...@incubator.apache.org > > >> > > >> > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org