Thejas, Please add me as a mentor if it helps to have diversity. I have enormous trust based on previous experience with him that Alan Gates would act as a highly impartial and effective mentor, but would be happy to help if there is a concern that could be addressed by having another mentor from a different company.
On Thu, Feb 26, 2015 at 6:12 PM, Thejas Nair <thejas.n...@gmail.com> wrote: > The incubator proposal has been updated with the feedback so far. > We have 3 mentors now, but I think it would be good to have additional > mentors. Please let me know if anyone is able to help mentor this > project. > > I am planning to start a vote on the proposal in a day or two. > > > On Fri, Feb 6, 2015 at 5:21 PM, <oo...@comp.nus.edu.sg> wrote: > > > > Regarding the number of users using this project -- at this moment, the > > community is not big. A few local start-ups have been trying to use it > > (mainly due to announcement in our seminar list), eg. one is using it for > > image recognition (given a phone snapped by a user, it wants to be return > > the same the product, and a list of similar products, such as a luxury > bag > > on a passerby). Researchers from outside of NUS may have been using it > > since we published an application paper on cross domain/modal retrieval > in > > VLDB 2014. > > > > We have not announced the project to the outside community yet -- we > would > > announce it in dbworld etc in due course. > > > > Thanks and have a good weekend. > > > > regards > > beng chin > > > >> > >> Thanks for the comments and suggestions. > >> With permission from Thejas, I would like to respond to point 2. > >> > >> We have a huge team down at NUS (National University of Singapore) -- > >> we have about seven database/data mining data professors (not including > >> those in systems, networking, and machine learning). > >> I myself have nine PhD students in a steady state, and I have a few > large > >> grants, with a total budget of about 15 million S$ (~12 million USD), > that > >> allows me to hire a number of research fellows and research assistants > for > >> the next few years. In a constant state, I have about 20 people (PhD > >> students/RA/RF) working with me alone. Other professors have their own > >> grants (unlike other countries, it is relatively easy to get large > grants > >> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc > >> have research labs funded by Singapore Research Foundation [equivalent > of > >> NSF]). > >> > >> SINGA is a long term project for us -- while it is a platform as it is, > we > >> are using it for healthcare predictive analytics (by working with a > >> hospital associated with the University). Therefore, we will be working > >> on SINGA, not solely as a distributed DL platform, but as a tool that > will > >> enable us to do data analytics on some business domains (eg. healthcase, > >> consumer etc) > >> > >> For the initial set of committers, three are tenured professors, five > are > >> students, with 2-5 years to go before they complete their PhD. Quite > >> often, some would stay back as a research fellow for a couple of years > >> before they start looking for a job outside. We will work with mentors > >> and new developers (from outside of NUS or Zhejiang University) in > >> enhancing the system. > >> > >> The project should survive in that sense. > >> > >> (I have an on-going project CIIDAA that has been around since 2008; it > was > >> started as another project, epiC, with a different grant, and then we > >> continue the development with a new grant for CIIDAA -- > >> http://www.comp.nus.edu.sg/~ciidaa/ > >> ) > >> > >> Thanks. > >> > >> regards > >> beng chin > >> ps: i am not sure if my email will get through to the group. > >> > >> > >> ---------------------------- Original Message > ---------------------------- > >> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator > >> From: "Henry Saputra" <henry.sapu...@gmail.com> > >> Date: Thu, February 5, 2015 2:57 pm > >> To: "general@incubator.apache.org" <general@incubator.apache.org> > >> Cc: oo...@comp.nus.edu.sg > >> > -------------------------------------------------------------------------- > >> > >> Several comments: > >> -) How many users already using this project? I would reccomend to > >> drop request for singa-user list at the beginning. > >> -) All the initial committers come from university and seemed like > >> some of them already ready to leave university. I am not too sure if > >> this project go survive if all of the inital committers are from > >> university as students. > >> -) Need to solicit more mentors if this project ever get to Apache > >> incubator. > >> > >> - Henry > >> > >> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.n...@gmail.com> > wrote: > >>> The "Relationship with Other Apache Products" section has been > >>> updated. The reference to H2O in that section has been removed, and > >>> other projects have been added. > >>> Thanks for the feedback! > >>> > >>> > >>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.n...@gmail.com> > >> wrote: > >>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an > >>>> apache project, I should have verified that. > >>>> I will edit that, and revisit that section along with the folks in > >>>> Singa community. > >>>> > >>>> > >>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra > >> <henry.sapu...@gmail.com> wrote: > >>>>> Quick immediate comment that "Apache H2O" is not really Apache > >>>>> project. > >>>>> > >>>>> I assume you are referring to https://github.com/h2oai/h2o (or > >>>>> https://github.com/h2oai/h2o-dev) ? > >>>>> > >>>>> - Henry > >>>>> > >>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.n...@gmail.com> > >> wrote: > >>>>>> Hello everyone, > >>>>>> > >>>>>> I would like to propose the inclusion of Singa as an Apache > Incubator > >> project. > >>>>>> > >>>>>> Here is the proposal - > >>>>>> https://wiki.apache.org/incubator/SingaProposal > >>>>>> > >>>>>> Please review the proposal and give feedback. I am planning to start > >>>>>> a > >>>>>> vote after 7 days if the proposal looks good. > >>>>>> We are also seeking additional Apache mentors for the project. > >>>>>> > >>>>>> Thanks, > >>>>>> Thejas > >>>>>> ========================================================== > >>>>>> Singa Incubator Proposal > >>>>>> > >>>>>> Abstract > >>>>>> > >>>>>> SINGA is a distributed deep learning platform. > >>>>>> > >>>>>> Proposal > >>>>>> > >>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform > >>>>>> for training deep learning models, e.g., Deep Convolutional Neural > >>>>>> Network and Deep Belief Network. It parallelizes the computation > >>>>>> (i.e., training) onto a cluster of nodes by distributing the > training > >>>>>> data and model automatically to speed up the training. Built-in > >>>>>> training algorithms like Back-Propagation and Contrastive Divergence > >>>>>> are implemented based on common abstractions of deep learning > models. > >>>>>> Users can train their own deep learning models by simply customizing > >>>>>> these abstractions like implementing the Mapper and Reducer in > >>>>>> Hadoop. > >>>>>> > >>>>>> Background > >>>>>> > >>>>>> Deep learning refers to a set of feature (or representation) > learning > >>>>>> models that consist of multiple (non-linear) layers, where different > >>>>>> layers learn different levels of abstractions (representations) of > >>>>>> the > >>>>>> raw input data. Larger (in terms of model parameters) and deeper (in > >>>>>> terms of number of layers) models have shown better performance, > >>>>>> e.g., > >>>>>> lower image classification error in Large Scale Visual Recognition > >>>>>> Challenge. However, a larger model requires more memory and larger > >>>>>> training data to reduce over-fitting. Complex numeric operations > make > >>>>>> the training computation intensive. In practice, training large deep > >>>>>> learning models takes weeks or months on a single node (even with > >>>>>> GPU). > >>>>>> > >>>>>> Rational > >>>>>> > >>>>>> Deep learning has gained a lot of attraction in both academia and > >>>>>> industry due to its success in a wide range of areas such as > computer > >>>>>> vision and speech recognition. However, training of such models is > >>>>>> computationally expensive, especially for large and deep models > >>>>>> (e.g., > >>>>>> with billions of parameters and more than 10 layers). Both Google > and > >>>>>> Microsoft have developed distributed deep learning systems to make > >>>>>> the > >>>>>> training more efficient by distributing the computations within a > >>>>>> cluster of nodes. However, these systems are closed source > softwares. > >>>>>> Our goal is to leverage the community of open source developers to > >>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full > >>>>>> fledged distributed platform, that could benefit the community and > >>>>>> also benefit from the community in their involvement in contributing > >>>>>> to the further work in this area. We believe the nature of SINGA and > >>>>>> our visions for the system fit naturally to Apache's philosophy and > >>>>>> development framework. > >>>>>> > >>>>>> Initial Goals > >>>>>> > >>>>>> We have developed a system for SINGA running on a commodity computer > >>>>>> cluster. The initial goals include, * improving the system in terms > >>>>>> of > >>>>>> scalability and efficiency, e.g., using Infiniband for network > >>>>>> communication and multi-threading for one node computation. We would > >>>>>> consider extending SINGA to GPU clusters later. * benchmarking with > >>>>>> larger datasets (hundreds of millions of training instances) and > >>>>>> models (billions of parameters). * adding more built-in deep > learning > >>>>>> models. Users can train the built-in models on their datasets > >>>>>> directly. > >>>>>> > >>>>>> Current Status > >>>>>> > >>>>>> Meritocracy > >>>>>> > >>>>>> We would like to follow ASF meritocratic principles to encourage > more > >>>>>> developers to contribute in this project. We know that only active > >>>>>> and > >>>>>> excellent developers can make SINGA a successful project. The > >>>>>> committer list and PMC will be updated based on developers' > >>>>>> performance and commitment. We are also improving the documentation > >>>>>> and code to help new developers get started quickly. > >>>>>> > >>>>>> Community > >>>>>> > >>>>>> SINGA is currently being developed in the Database System Research > >>>>>> Lab > >>>>>> at the National University of Singapore (NUS) in collaboration with > >>>>>> Zhejiang University in China. Our lab has extensive experience in > >>>>>> building database related systems, including distributed systems. > Six > >>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng, > >>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research > >>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, > >>>>>> Kian > >>>>>> Lee Tan) have been working for a year on this project. We are open > to > >>>>>> recruiting more developers from diverse backgrounds. > >>>>>> > >>>>>> Core Developers > >>>>>> > >>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have > >>>>>> worked on distributed systems for more than 20 years. They have > >>>>>> collaborated with the industry and have built various large scale > >>>>>> systems. Anh Dinh's research is also on distributed systems, albeit > >>>>>> with more focus on security aspects. Wei Wang's research is on deep > >>>>>> learning problems including deep learning applications and large > >>>>>> scale > >>>>>> training. Sheng Wang and Jinyang are working on efficient indexing, > >>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing > >>>>>> and Zhongle are new PhD students who jointed SINGA recently. They > >>>>>> will > >>>>>> work on this project for a longer time (next 4-5 years). While we > >>>>>> share common research interests, each member also brings diverse > >>>>>> expertise to the team. > >>>>>> > >>>>>> Alignment > >>>>>> > >>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop, > >>>>>> Spark and Mahout, each of which targets a different application > >>>>>> domain. SINGA, being a distributed platform for large-scale deep > >>>>>> learning, focuses on another important domain for which there still > >>>>>> lacks a robust and scalable open-source platform. The recent success > >>>>>> of deep learning models especially for vision and speech recognition > >>>>>> tasks has generated interests in both applying existing deep > learning > >>>>>> models and in developing new ones. Thus, an open-source platform for > >>>>>> deep learning will be able to attract a large community of users and > >>>>>> developers. SINGA is a complex system needing many iterations of > >>>>>> design, implementation and testing. Apache's collaboration framework > >>>>>> which encourages active contribution from developers will inevitably > >>>>>> help improve the quality of the system, as shown in the success of > >>>>>> Hadoop, Spark, etc.. Equally important is the community of users > >>>>>> which > >>>>>> helps identify real-life applications of deep learning, and helps to > >>>>>> evaluate the system's performance and ease-of-use. We hope to > >>>>>> leverage > >>>>>> ASF for coordinating and promoting both communities, and in return > >>>>>> benefit the communities with another useful tool. > >>>>>> > >>>>>> Known Risks > >>>>>> > >>>>>> Orphaned products > >>>>>> > >>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may > >>>>>> leave > >>>>>> the lab in two to four years time. It is possible that some of them > >>>>>> may not have enough time to focus on this project after that. But, > >>>>>> SINGA is part of our other bigger research projects on building an > >>>>>> infrastructure for data intensive applications, which include > >>>>>> health-care analytics and brain-inspired computing. Beng Chin and > >>>>>> Kian > >>>>>> Lee would continue working on it and getting more people involved. > >>>>>> For > >>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined > >>>>>> us recently. Individual developers are welcome to make SINGA a > >>>>>> diverse > >>>>>> community that is robust and independent from any single developer. > >>>>>> > >>>>>> Inexperience with Open Source > >>>>>> > >>>>>> All the developers are active users and followers of open source > >>>>>> projects. Our research lab has a strong commitment to open source, > >>>>>> and > >>>>>> has released the source code of several systems under open source > >>>>>> license as a way of contributing back to the open source community. > >>>>>> But we do not have much real experience in open source projects with > >>>>>> large and well organized communities like those in Apache. This is > >>>>>> one > >>>>>> reason we choose Apache which is experienced in open source project > >>>>>> incubation. We hope to get the help from Apache (e.g., champion and > >>>>>> mentors) to establish a healthy path for SINGA. > >>>>>> > >>>>>> Homogenous Developers > >>>>>> > >>>>>> Although the current developers are researchers in the universities, > >>>>>> they have different research interests and project experiences, as > >>>>>> mentioned in the section that introduces the core developers. We > know > >>>>>> that a diverse community is helpful. Hence we are open to the idea > of > >>>>>> recruiting developers from other regions and organizations. > >>>>>> > >>>>>> Reliance on Salaried Developers > >>>>>> > >>>>>> As a research project in the university, SINGA's current developing > >>>>>> community consists of professors, PhD students, research assistants > >>>>>> and postdoctoral fellows. They are driven by their interests to work > >>>>>> on this project and have contributed actively since the start of the > >>>>>> project. The research assistants and fellows are expected to leave > >>>>>> when their contracts expire. However, they are keen to continue to > >>>>>> work on the project voluntarily. Moreover, as a long term research > >>>>>> project, new research assistants and fellows are likely to join the > >>>>>> project. > >>>>>> > >>>>>> A Excessive Fascination with the Apache Brand > >>>>>> > >>>>>> We choose Apache not for publicity. We have two purposes. First, we > >>>>>> want to leverage Apache's reputation to recruit more developers to > >>>>>> make a diverse community. Second, we hope that Apache can help us to > >>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee > >>>>>> are established database and distributed system researchers, and > >>>>>> together with the other contributors, they sincerely believe that > >>>>>> there is a need for a widely accepted open source distributed deep > >>>>>> learning platform. The field of deep learning is still at its > >>>>>> infancy, > >>>>>> and an open source platform will fuel the research in the area. > >>>>>> Moreover, such a platform will enable researchers to develop new > >>>>>> models and algorithms, rather than spending time implementing a deep > >>>>>> learning system from scratch. Furthermore, the need for scalability > >>>>>> for such a platform is obvious. > >>>>>> > >>>>>> Relationship with Other Apache Products > >>>>>> > >>>>>> Apache H2O implemented two simple deep learning models, namely the > >>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two > >>>>>> significant differences between H2O and SINGA. First, H2O adopts the > >>>>>> Map-Reduce framework which runs a set of computing nodes in parallel > >>>>>> againsts of the training set. Model parameters trained by all > >>>>>> computing nodes are averaged as the final model parameters. This > >>>>>> training algorithm is different from the distributed training > >>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently > >>>>>> synchronizes the parameters trained from different nodes. SINGA > >>>>>> adopts > >>>>>> the parameter server framework to support a wide range of > distributed > >>>>>> training algorithms and parallelization methods (e.g., data > >>>>>> parallelism, model parallelism and hybrid parallelism. H2O only > >>>>>> support data parallelism) . Second, in H2O, users are restricted to > >>>>>> use the two built-in models. In SINGA, we provide simple programming > >>>>>> model to let users implement their own deep learning models. A new > >>>>>> deep learning model can be implemented by customizing the base Layer > >>>>>> class for each layer involved in the model. It is similar to writing > >>>>>> Hadoop programs where users only need to override the base Mapper > and > >>>>>> Reducer. We also provide built-in models for users to use directly. > >>>>>> > >>>>>> Documentation > >>>>>> > >>>>>> The project is hosted at > >>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html. > >>>>>> Documentations can be found at the Github Wiki Page: > >>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and > >>>>>> improve the documentation. > >>>>>> > >>>>>> Initial Source > >>>>>> > >>>>>> We use Github to maintain our source code, > >> https://github.com/nusinga/singa > >>>>>> > >>>>>> Source and Intellectual Property Submission Plan > >>>>>> > >>>>>> We plan to make our code base be under Apache License, Version 2.0. > >>>>>> > >>>>>> External Dependencies > >>>>>> > >>>>>> required by the core code base: glog, gflags, google protobuf, > >>>>>> open-blas, mpich, armci-mpi. > >>>>>> required by data preparation and preprocessing: opencv, hdfs, > python. > >>>>>> > >>>>>> Cryptography > >>>>>> > >>>>>> Not Applicable > >>>>>> > >>>>>> Required Resources > >>>>>> > >>>>>> Mailing Lists > >>>>>> > >>>>>> Currently, we use google group for internal discussion. The mailing > >>>>>> address is nusi...@googlegroup.com. We will migrate the content to > >>>>>> the > >>>>>> apache mailing lists in the future. > >>>>>> > >>>>>> singa-dev > >>>>>> singa-user > >>>>>> singa-commits > >>>>>> singa-private (for private discussion within PCM) > >>>>>> > >>>>>> Git Repository > >>>>>> > >>>>>> We want to continue using git for version control. Hence, a git repo > >>>>>> is required. > >>>>>> > >>>>>> Issue Tracking > >>>>>> > >>>>>> JIRA Singa (SINGA) > >>>>>> > >>>>>> Initial Committers > >>>>>> > >>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg) > >>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg) > >>>>>> Gang Chen (cg @zju.edu.cn) > >>>>>> Wei Wang (wangwei @comp.nus.edu.sg) > >>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg) > >>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg) > >>>>>> Sheng Wang (wangsh @comp.nus.edu.sg) > >>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg) > >>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg) > >>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg) > >>>>>> > >>>>>> Affiliations > >>>>>> > >>>>>> Beng Chin Ooi, National University of Singapore > >>>>>> Kian Lee Tan, National University of Singapore > >>>>>> Gang Chen, Zhejiang University > >>>>>> Wei Wang, National University of Singapore > >>>>>> Dinh Tien Tuan Anh, National University of Singapore > >>>>>> Jinyang Gao, National University of Singapore > >>>>>> Sheng Wang, National University of Singapore > >>>>>> Kaiping Zheng, National University of Singapore > >>>>>> Zhaojing Luo, National University of Singapore > >>>>>> Zhongle Xie, National University of Singapore > >>>>>> > >>>>>> Sponsors > >>>>>> > >>>>>> Champion > >>>>>> > >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks > >>>>>> > >>>>>> Nominated Mentors > >>>>>> > >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks > >>>>>> Alan Gates (gates at apache dot org) - Hortonworks > >>>>>> (Seeking more volunteers!) > >>>>>> > >>>>>> Sponsoring Entity > >>>>>> > >>>>>> We are requesting the Incubator to sponsor this project. > >>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >>>>>> For additional commands, e-mail: general-h...@incubator.apache.org > >>>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >>>>> For additional commands, e-mail: general-h...@incubator.apache.org > >>>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >>> For additional commands, e-mail: general-h...@incubator.apache.org > >>> > >> > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >