I am strongly suggest you solicit more (diverse) mentors before start the VOTE.
All initial committers are from same org and all initial mentors are from same company (HW). I am not sure this is a good start for Apache podling. - Henry On Thu, Feb 26, 2015 at 9:12 AM, Thejas Nair <thejas.n...@gmail.com> wrote: > The incubator proposal has been updated with the feedback so far. > We have 3 mentors now, but I think it would be good to have additional > mentors. Please let me know if anyone is able to help mentor this > project. > > I am planning to start a vote on the proposal in a day or two. > > > On Fri, Feb 6, 2015 at 5:21 PM, <oo...@comp.nus.edu.sg> wrote: >> >> Regarding the number of users using this project -- at this moment, the >> community is not big. A few local start-ups have been trying to use it >> (mainly due to announcement in our seminar list), eg. one is using it for >> image recognition (given a phone snapped by a user, it wants to be return >> the same the product, and a list of similar products, such as a luxury bag >> on a passerby). Researchers from outside of NUS may have been using it >> since we published an application paper on cross domain/modal retrieval in >> VLDB 2014. >> >> We have not announced the project to the outside community yet -- we would >> announce it in dbworld etc in due course. >> >> Thanks and have a good weekend. >> >> regards >> beng chin >> >>> >>> Thanks for the comments and suggestions. >>> With permission from Thejas, I would like to respond to point 2. >>> >>> We have a huge team down at NUS (National University of Singapore) -- >>> we have about seven database/data mining data professors (not including >>> those in systems, networking, and machine learning). >>> I myself have nine PhD students in a steady state, and I have a few large >>> grants, with a total budget of about 15 million S$ (~12 million USD), that >>> allows me to hire a number of research fellows and research assistants for >>> the next few years. In a constant state, I have about 20 people (PhD >>> students/RA/RF) working with me alone. Other professors have their own >>> grants (unlike other countries, it is relatively easy to get large grants >>> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc >>> have research labs funded by Singapore Research Foundation [equivalent of >>> NSF]). >>> >>> SINGA is a long term project for us -- while it is a platform as it is, we >>> are using it for healthcare predictive analytics (by working with a >>> hospital associated with the University). Therefore, we will be working >>> on SINGA, not solely as a distributed DL platform, but as a tool that will >>> enable us to do data analytics on some business domains (eg. healthcase, >>> consumer etc) >>> >>> For the initial set of committers, three are tenured professors, five are >>> students, with 2-5 years to go before they complete their PhD. Quite >>> often, some would stay back as a research fellow for a couple of years >>> before they start looking for a job outside. We will work with mentors >>> and new developers (from outside of NUS or Zhejiang University) in >>> enhancing the system. >>> >>> The project should survive in that sense. >>> >>> (I have an on-going project CIIDAA that has been around since 2008; it was >>> started as another project, epiC, with a different grant, and then we >>> continue the development with a new grant for CIIDAA -- >>> http://www.comp.nus.edu.sg/~ciidaa/ >>> ) >>> >>> Thanks. >>> >>> regards >>> beng chin >>> ps: i am not sure if my email will get through to the group. >>> >>> >>> ---------------------------- Original Message ---------------------------- >>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator >>> From: "Henry Saputra" <henry.sapu...@gmail.com> >>> Date: Thu, February 5, 2015 2:57 pm >>> To: "general@incubator.apache.org" <general@incubator.apache.org> >>> Cc: oo...@comp.nus.edu.sg >>> -------------------------------------------------------------------------- >>> >>> Several comments: >>> -) How many users already using this project? I would reccomend to >>> drop request for singa-user list at the beginning. >>> -) All the initial committers come from university and seemed like >>> some of them already ready to leave university. I am not too sure if >>> this project go survive if all of the inital committers are from >>> university as students. >>> -) Need to solicit more mentors if this project ever get to Apache >>> incubator. >>> >>> - Henry >>> >>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.n...@gmail.com> wrote: >>>> The "Relationship with Other Apache Products" section has been >>>> updated. The reference to H2O in that section has been removed, and >>>> other projects have been added. >>>> Thanks for the feedback! >>>> >>>> >>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.n...@gmail.com> >>> wrote: >>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an >>>>> apache project, I should have verified that. >>>>> I will edit that, and revisit that section along with the folks in >>>>> Singa community. >>>>> >>>>> >>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra >>> <henry.sapu...@gmail.com> wrote: >>>>>> Quick immediate comment that "Apache H2O" is not really Apache >>>>>> project. >>>>>> >>>>>> I assume you are referring to https://github.com/h2oai/h2o (or >>>>>> https://github.com/h2oai/h2o-dev) ? >>>>>> >>>>>> - Henry >>>>>> >>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.n...@gmail.com> >>> wrote: >>>>>>> Hello everyone, >>>>>>> >>>>>>> I would like to propose the inclusion of Singa as an Apache Incubator >>> project. >>>>>>> >>>>>>> Here is the proposal - >>>>>>> https://wiki.apache.org/incubator/SingaProposal >>>>>>> >>>>>>> Please review the proposal and give feedback. I am planning to start >>>>>>> a >>>>>>> vote after 7 days if the proposal looks good. >>>>>>> We are also seeking additional Apache mentors for the project. >>>>>>> >>>>>>> Thanks, >>>>>>> Thejas >>>>>>> ========================================================== >>>>>>> Singa Incubator Proposal >>>>>>> >>>>>>> Abstract >>>>>>> >>>>>>> SINGA is a distributed deep learning platform. >>>>>>> >>>>>>> Proposal >>>>>>> >>>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform >>>>>>> for training deep learning models, e.g., Deep Convolutional Neural >>>>>>> Network and Deep Belief Network. It parallelizes the computation >>>>>>> (i.e., training) onto a cluster of nodes by distributing the training >>>>>>> data and model automatically to speed up the training. Built-in >>>>>>> training algorithms like Back-Propagation and Contrastive Divergence >>>>>>> are implemented based on common abstractions of deep learning models. >>>>>>> Users can train their own deep learning models by simply customizing >>>>>>> these abstractions like implementing the Mapper and Reducer in >>>>>>> Hadoop. >>>>>>> >>>>>>> Background >>>>>>> >>>>>>> Deep learning refers to a set of feature (or representation) learning >>>>>>> models that consist of multiple (non-linear) layers, where different >>>>>>> layers learn different levels of abstractions (representations) of >>>>>>> the >>>>>>> raw input data. Larger (in terms of model parameters) and deeper (in >>>>>>> terms of number of layers) models have shown better performance, >>>>>>> e.g., >>>>>>> lower image classification error in Large Scale Visual Recognition >>>>>>> Challenge. However, a larger model requires more memory and larger >>>>>>> training data to reduce over-fitting. Complex numeric operations make >>>>>>> the training computation intensive. In practice, training large deep >>>>>>> learning models takes weeks or months on a single node (even with >>>>>>> GPU). >>>>>>> >>>>>>> Rational >>>>>>> >>>>>>> Deep learning has gained a lot of attraction in both academia and >>>>>>> industry due to its success in a wide range of areas such as computer >>>>>>> vision and speech recognition. However, training of such models is >>>>>>> computationally expensive, especially for large and deep models >>>>>>> (e.g., >>>>>>> with billions of parameters and more than 10 layers). Both Google and >>>>>>> Microsoft have developed distributed deep learning systems to make >>>>>>> the >>>>>>> training more efficient by distributing the computations within a >>>>>>> cluster of nodes. However, these systems are closed source softwares. >>>>>>> Our goal is to leverage the community of open source developers to >>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full >>>>>>> fledged distributed platform, that could benefit the community and >>>>>>> also benefit from the community in their involvement in contributing >>>>>>> to the further work in this area. We believe the nature of SINGA and >>>>>>> our visions for the system fit naturally to Apache's philosophy and >>>>>>> development framework. >>>>>>> >>>>>>> Initial Goals >>>>>>> >>>>>>> We have developed a system for SINGA running on a commodity computer >>>>>>> cluster. The initial goals include, * improving the system in terms >>>>>>> of >>>>>>> scalability and efficiency, e.g., using Infiniband for network >>>>>>> communication and multi-threading for one node computation. We would >>>>>>> consider extending SINGA to GPU clusters later. * benchmarking with >>>>>>> larger datasets (hundreds of millions of training instances) and >>>>>>> models (billions of parameters). * adding more built-in deep learning >>>>>>> models. Users can train the built-in models on their datasets >>>>>>> directly. >>>>>>> >>>>>>> Current Status >>>>>>> >>>>>>> Meritocracy >>>>>>> >>>>>>> We would like to follow ASF meritocratic principles to encourage more >>>>>>> developers to contribute in this project. We know that only active >>>>>>> and >>>>>>> excellent developers can make SINGA a successful project. The >>>>>>> committer list and PMC will be updated based on developers' >>>>>>> performance and commitment. We are also improving the documentation >>>>>>> and code to help new developers get started quickly. >>>>>>> >>>>>>> Community >>>>>>> >>>>>>> SINGA is currently being developed in the Database System Research >>>>>>> Lab >>>>>>> at the National University of Singapore (NUS) in collaboration with >>>>>>> Zhejiang University in China. Our lab has extensive experience in >>>>>>> building database related systems, including distributed systems. Six >>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng, >>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research >>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, >>>>>>> Kian >>>>>>> Lee Tan) have been working for a year on this project. We are open to >>>>>>> recruiting more developers from diverse backgrounds. >>>>>>> >>>>>>> Core Developers >>>>>>> >>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have >>>>>>> worked on distributed systems for more than 20 years. They have >>>>>>> collaborated with the industry and have built various large scale >>>>>>> systems. Anh Dinh's research is also on distributed systems, albeit >>>>>>> with more focus on security aspects. Wei Wang's research is on deep >>>>>>> learning problems including deep learning applications and large >>>>>>> scale >>>>>>> training. Sheng Wang and Jinyang are working on efficient indexing, >>>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing >>>>>>> and Zhongle are new PhD students who jointed SINGA recently. They >>>>>>> will >>>>>>> work on this project for a longer time (next 4-5 years). While we >>>>>>> share common research interests, each member also brings diverse >>>>>>> expertise to the team. >>>>>>> >>>>>>> Alignment >>>>>>> >>>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop, >>>>>>> Spark and Mahout, each of which targets a different application >>>>>>> domain. SINGA, being a distributed platform for large-scale deep >>>>>>> learning, focuses on another important domain for which there still >>>>>>> lacks a robust and scalable open-source platform. The recent success >>>>>>> of deep learning models especially for vision and speech recognition >>>>>>> tasks has generated interests in both applying existing deep learning >>>>>>> models and in developing new ones. Thus, an open-source platform for >>>>>>> deep learning will be able to attract a large community of users and >>>>>>> developers. SINGA is a complex system needing many iterations of >>>>>>> design, implementation and testing. Apache's collaboration framework >>>>>>> which encourages active contribution from developers will inevitably >>>>>>> help improve the quality of the system, as shown in the success of >>>>>>> Hadoop, Spark, etc.. Equally important is the community of users >>>>>>> which >>>>>>> helps identify real-life applications of deep learning, and helps to >>>>>>> evaluate the system's performance and ease-of-use. We hope to >>>>>>> leverage >>>>>>> ASF for coordinating and promoting both communities, and in return >>>>>>> benefit the communities with another useful tool. >>>>>>> >>>>>>> Known Risks >>>>>>> >>>>>>> Orphaned products >>>>>>> >>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may >>>>>>> leave >>>>>>> the lab in two to four years time. It is possible that some of them >>>>>>> may not have enough time to focus on this project after that. But, >>>>>>> SINGA is part of our other bigger research projects on building an >>>>>>> infrastructure for data intensive applications, which include >>>>>>> health-care analytics and brain-inspired computing. Beng Chin and >>>>>>> Kian >>>>>>> Lee would continue working on it and getting more people involved. >>>>>>> For >>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined >>>>>>> us recently. Individual developers are welcome to make SINGA a >>>>>>> diverse >>>>>>> community that is robust and independent from any single developer. >>>>>>> >>>>>>> Inexperience with Open Source >>>>>>> >>>>>>> All the developers are active users and followers of open source >>>>>>> projects. Our research lab has a strong commitment to open source, >>>>>>> and >>>>>>> has released the source code of several systems under open source >>>>>>> license as a way of contributing back to the open source community. >>>>>>> But we do not have much real experience in open source projects with >>>>>>> large and well organized communities like those in Apache. This is >>>>>>> one >>>>>>> reason we choose Apache which is experienced in open source project >>>>>>> incubation. We hope to get the help from Apache (e.g., champion and >>>>>>> mentors) to establish a healthy path for SINGA. >>>>>>> >>>>>>> Homogenous Developers >>>>>>> >>>>>>> Although the current developers are researchers in the universities, >>>>>>> they have different research interests and project experiences, as >>>>>>> mentioned in the section that introduces the core developers. We know >>>>>>> that a diverse community is helpful. Hence we are open to the idea of >>>>>>> recruiting developers from other regions and organizations. >>>>>>> >>>>>>> Reliance on Salaried Developers >>>>>>> >>>>>>> As a research project in the university, SINGA's current developing >>>>>>> community consists of professors, PhD students, research assistants >>>>>>> and postdoctoral fellows. They are driven by their interests to work >>>>>>> on this project and have contributed actively since the start of the >>>>>>> project. The research assistants and fellows are expected to leave >>>>>>> when their contracts expire. However, they are keen to continue to >>>>>>> work on the project voluntarily. Moreover, as a long term research >>>>>>> project, new research assistants and fellows are likely to join the >>>>>>> project. >>>>>>> >>>>>>> A Excessive Fascination with the Apache Brand >>>>>>> >>>>>>> We choose Apache not for publicity. We have two purposes. First, we >>>>>>> want to leverage Apache's reputation to recruit more developers to >>>>>>> make a diverse community. Second, we hope that Apache can help us to >>>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee >>>>>>> are established database and distributed system researchers, and >>>>>>> together with the other contributors, they sincerely believe that >>>>>>> there is a need for a widely accepted open source distributed deep >>>>>>> learning platform. The field of deep learning is still at its >>>>>>> infancy, >>>>>>> and an open source platform will fuel the research in the area. >>>>>>> Moreover, such a platform will enable researchers to develop new >>>>>>> models and algorithms, rather than spending time implementing a deep >>>>>>> learning system from scratch. Furthermore, the need for scalability >>>>>>> for such a platform is obvious. >>>>>>> >>>>>>> Relationship with Other Apache Products >>>>>>> >>>>>>> Apache H2O implemented two simple deep learning models, namely the >>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two >>>>>>> significant differences between H2O and SINGA. First, H2O adopts the >>>>>>> Map-Reduce framework which runs a set of computing nodes in parallel >>>>>>> againsts of the training set. Model parameters trained by all >>>>>>> computing nodes are averaged as the final model parameters. This >>>>>>> training algorithm is different from the distributed training >>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently >>>>>>> synchronizes the parameters trained from different nodes. SINGA >>>>>>> adopts >>>>>>> the parameter server framework to support a wide range of distributed >>>>>>> training algorithms and parallelization methods (e.g., data >>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only >>>>>>> support data parallelism) . Second, in H2O, users are restricted to >>>>>>> use the two built-in models. In SINGA, we provide simple programming >>>>>>> model to let users implement their own deep learning models. A new >>>>>>> deep learning model can be implemented by customizing the base Layer >>>>>>> class for each layer involved in the model. It is similar to writing >>>>>>> Hadoop programs where users only need to override the base Mapper and >>>>>>> Reducer. We also provide built-in models for users to use directly. >>>>>>> >>>>>>> Documentation >>>>>>> >>>>>>> The project is hosted at >>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html. >>>>>>> Documentations can be found at the Github Wiki Page: >>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and >>>>>>> improve the documentation. >>>>>>> >>>>>>> Initial Source >>>>>>> >>>>>>> We use Github to maintain our source code, >>> https://github.com/nusinga/singa >>>>>>> >>>>>>> Source and Intellectual Property Submission Plan >>>>>>> >>>>>>> We plan to make our code base be under Apache License, Version 2.0. >>>>>>> >>>>>>> External Dependencies >>>>>>> >>>>>>> required by the core code base: glog, gflags, google protobuf, >>>>>>> open-blas, mpich, armci-mpi. >>>>>>> required by data preparation and preprocessing: opencv, hdfs, python. >>>>>>> >>>>>>> Cryptography >>>>>>> >>>>>>> Not Applicable >>>>>>> >>>>>>> Required Resources >>>>>>> >>>>>>> Mailing Lists >>>>>>> >>>>>>> Currently, we use google group for internal discussion. The mailing >>>>>>> address is nusi...@googlegroup.com. We will migrate the content to >>>>>>> the >>>>>>> apache mailing lists in the future. >>>>>>> >>>>>>> singa-dev >>>>>>> singa-user >>>>>>> singa-commits >>>>>>> singa-private (for private discussion within PCM) >>>>>>> >>>>>>> Git Repository >>>>>>> >>>>>>> We want to continue using git for version control. Hence, a git repo >>>>>>> is required. >>>>>>> >>>>>>> Issue Tracking >>>>>>> >>>>>>> JIRA Singa (SINGA) >>>>>>> >>>>>>> Initial Committers >>>>>>> >>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg) >>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg) >>>>>>> Gang Chen (cg @zju.edu.cn) >>>>>>> Wei Wang (wangwei @comp.nus.edu.sg) >>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg) >>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg) >>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg) >>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg) >>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg) >>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg) >>>>>>> >>>>>>> Affiliations >>>>>>> >>>>>>> Beng Chin Ooi, National University of Singapore >>>>>>> Kian Lee Tan, National University of Singapore >>>>>>> Gang Chen, Zhejiang University >>>>>>> Wei Wang, National University of Singapore >>>>>>> Dinh Tien Tuan Anh, National University of Singapore >>>>>>> Jinyang Gao, National University of Singapore >>>>>>> Sheng Wang, National University of Singapore >>>>>>> Kaiping Zheng, National University of Singapore >>>>>>> Zhaojing Luo, National University of Singapore >>>>>>> Zhongle Xie, National University of Singapore >>>>>>> >>>>>>> Sponsors >>>>>>> >>>>>>> Champion >>>>>>> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks >>>>>>> >>>>>>> Nominated Mentors >>>>>>> >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks >>>>>>> Alan Gates (gates at apache dot org) - Hortonworks >>>>>>> (Seeking more volunteers!) >>>>>>> >>>>>>> Sponsoring Entity >>>>>>> >>>>>>> We are requesting the Incubator to sponsor this project. >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>>>>>> For additional commands, e-mail: general-h...@incubator.apache.org >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>>>>> For additional commands, e-mail: general-h...@incubator.apache.org >>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>>> For additional commands, e-mail: general-h...@incubator.apache.org >>>> >>> >>> >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org