On Friday, February 27, 2015, Henry Saputra <henry.sapu...@gmail.com> wrote:
> I am strongly suggest you solicit more (diverse) mentors before start the > VOTE. > > All initial committers are from same org and all initial mentors are > from same company (HW). We do have a requirement for diversity, for me all initial committers from the same company is just as big a problem as mentors. when everyone involved are from the same company then that signals a serious problem which should be addressed before starting a vote. rgds jan i > > I am not sure this is a good start for Apache podling. > > > - Henry > > On Thu, Feb 26, 2015 at 9:12 AM, Thejas Nair <thejas.n...@gmail.com > <javascript:;>> wrote: > > The incubator proposal has been updated with the feedback so far. > > We have 3 mentors now, but I think it would be good to have additional > > mentors. Please let me know if anyone is able to help mentor this > > project. > > > > I am planning to start a vote on the proposal in a day or two. > > > > > > On Fri, Feb 6, 2015 at 5:21 PM, <oo...@comp.nus.edu.sg <javascript:;>> > wrote: > >> > >> Regarding the number of users using this project -- at this moment, the > >> community is not big. A few local start-ups have been trying to use it > >> (mainly due to announcement in our seminar list), eg. one is using it > for > >> image recognition (given a phone snapped by a user, it wants to be > return > >> the same the product, and a list of similar products, such as a luxury > bag > >> on a passerby). Researchers from outside of NUS may have been using it > >> since we published an application paper on cross domain/modal retrieval > in > >> VLDB 2014. > >> > >> We have not announced the project to the outside community yet -- we > would > >> announce it in dbworld etc in due course. > >> > >> Thanks and have a good weekend. > >> > >> regards > >> beng chin > >> > >>> > >>> Thanks for the comments and suggestions. > >>> With permission from Thejas, I would like to respond to point 2. > >>> > >>> We have a huge team down at NUS (National University of Singapore) -- > >>> we have about seven database/data mining data professors (not including > >>> those in systems, networking, and machine learning). > >>> I myself have nine PhD students in a steady state, and I have a few > large > >>> grants, with a total budget of about 15 million S$ (~12 million USD), > that > >>> allows me to hire a number of research fellows and research assistants > for > >>> the next few years. In a constant state, I have about 20 people (PhD > >>> students/RA/RF) working with me alone. Other professors have their own > >>> grants (unlike other countries, it is relatively easy to get large > grants > >>> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc > >>> have research labs funded by Singapore Research Foundation [equivalent > of > >>> NSF]). > >>> > >>> SINGA is a long term project for us -- while it is a platform as it > is, we > >>> are using it for healthcare predictive analytics (by working with a > >>> hospital associated with the University). Therefore, we will be > working > >>> on SINGA, not solely as a distributed DL platform, but as a tool that > will > >>> enable us to do data analytics on some business domains (eg. > healthcase, > >>> consumer etc) > >>> > >>> For the initial set of committers, three are tenured professors, five > are > >>> students, with 2-5 years to go before they complete their PhD. Quite > >>> often, some would stay back as a research fellow for a couple of years > >>> before they start looking for a job outside. We will work with mentors > >>> and new developers (from outside of NUS or Zhejiang University) in > >>> enhancing the system. > >>> > >>> The project should survive in that sense. > >>> > >>> (I have an on-going project CIIDAA that has been around since 2008; it > was > >>> started as another project, epiC, with a different grant, and then we > >>> continue the development with a new grant for CIIDAA -- > >>> http://www.comp.nus.edu.sg/~ciidaa/ > >>> ) > >>> > >>> Thanks. > >>> > >>> regards > >>> beng chin > >>> ps: i am not sure if my email will get through to the group. > >>> > >>> > >>> ---------------------------- Original Message > ---------------------------- > >>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator > >>> From: "Henry Saputra" <henry.sapu...@gmail.com <javascript:;>> > >>> Date: Thu, February 5, 2015 2:57 pm > >>> To: "general@incubator.apache.org <javascript:;>" < > general@incubator.apache.org <javascript:;>> > >>> Cc: oo...@comp.nus.edu.sg <javascript:;> > >>> > -------------------------------------------------------------------------- > >>> > >>> Several comments: > >>> -) How many users already using this project? I would reccomend to > >>> drop request for singa-user list at the beginning. > >>> -) All the initial committers come from university and seemed like > >>> some of them already ready to leave university. I am not too sure if > >>> this project go survive if all of the inital committers are from > >>> university as students. > >>> -) Need to solicit more mentors if this project ever get to Apache > >>> incubator. > >>> > >>> - Henry > >>> > >>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.n...@gmail.com > <javascript:;>> wrote: > >>>> The "Relationship with Other Apache Products" section has been > >>>> updated. The reference to H2O in that section has been removed, and > >>>> other projects have been added. > >>>> Thanks for the feedback! > >>>> > >>>> > >>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.n...@gmail.com > <javascript:;>> > >>> wrote: > >>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an > >>>>> apache project, I should have verified that. > >>>>> I will edit that, and revisit that section along with the folks in > >>>>> Singa community. > >>>>> > >>>>> > >>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra > >>> <henry.sapu...@gmail.com <javascript:;>> wrote: > >>>>>> Quick immediate comment that "Apache H2O" is not really Apache > >>>>>> project. > >>>>>> > >>>>>> I assume you are referring to https://github.com/h2oai/h2o (or > >>>>>> https://github.com/h2oai/h2o-dev) ? > >>>>>> > >>>>>> - Henry > >>>>>> > >>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.n...@gmail.com > <javascript:;>> > >>> wrote: > >>>>>>> Hello everyone, > >>>>>>> > >>>>>>> I would like to propose the inclusion of Singa as an Apache > Incubator > >>> project. > >>>>>>> > >>>>>>> Here is the proposal - > >>>>>>> https://wiki.apache.org/incubator/SingaProposal > >>>>>>> > >>>>>>> Please review the proposal and give feedback. I am planning to > start > >>>>>>> a > >>>>>>> vote after 7 days if the proposal looks good. > >>>>>>> We are also seeking additional Apache mentors for the project. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Thejas > >>>>>>> ========================================================== > >>>>>>> Singa Incubator Proposal > >>>>>>> > >>>>>>> Abstract > >>>>>>> > >>>>>>> SINGA is a distributed deep learning platform. > >>>>>>> > >>>>>>> Proposal > >>>>>>> > >>>>>>> SINGA is an efficient, scalable and easy-to-use distributed > platform > >>>>>>> for training deep learning models, e.g., Deep Convolutional Neural > >>>>>>> Network and Deep Belief Network. It parallelizes the computation > >>>>>>> (i.e., training) onto a cluster of nodes by distributing the > training > >>>>>>> data and model automatically to speed up the training. Built-in > >>>>>>> training algorithms like Back-Propagation and Contrastive > Divergence > >>>>>>> are implemented based on common abstractions of deep learning > models. > >>>>>>> Users can train their own deep learning models by simply > customizing > >>>>>>> these abstractions like implementing the Mapper and Reducer in > >>>>>>> Hadoop. > >>>>>>> > >>>>>>> Background > >>>>>>> > >>>>>>> Deep learning refers to a set of feature (or representation) > learning > >>>>>>> models that consist of multiple (non-linear) layers, where > different > >>>>>>> layers learn different levels of abstractions (representations) of > >>>>>>> the > >>>>>>> raw input data. Larger (in terms of model parameters) and deeper > (in > >>>>>>> terms of number of layers) models have shown better performance, > >>>>>>> e.g., > >>>>>>> lower image classification error in Large Scale Visual Recognition > >>>>>>> Challenge. However, a larger model requires more memory and larger > >>>>>>> training data to reduce over-fitting. Complex numeric operations > make > >>>>>>> the training computation intensive. In practice, training large > deep > >>>>>>> learning models takes weeks or months on a single node (even with > >>>>>>> GPU). > >>>>>>> > >>>>>>> Rational > >>>>>>> > >>>>>>> Deep learning has gained a lot of attraction in both academia and > >>>>>>> industry due to its success in a wide range of areas such as > computer > >>>>>>> vision and speech recognition. However, training of such models is > >>>>>>> computationally expensive, especially for large and deep models > >>>>>>> (e.g., > >>>>>>> with billions of parameters and more than 10 layers). Both Google > and > >>>>>>> Microsoft have developed distributed deep learning systems to make > >>>>>>> the > >>>>>>> training more efficient by distributing the computations within a > >>>>>>> cluster of nodes. However, these systems are closed source > softwares. > >>>>>>> Our goal is to leverage the community of open source developers to > >>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full > >>>>>>> fledged distributed platform, that could benefit the community and > >>>>>>> also benefit from the community in their involvement in > contributing > >>>>>>> to the further work in this area. We believe the nature of SINGA > and > >>>>>>> our visions for the system fit naturally to Apache's philosophy and > >>>>>>> development framework. > >>>>>>> > >>>>>>> Initial Goals > >>>>>>> > >>>>>>> We have developed a system for SINGA running on a commodity > computer > >>>>>>> cluster. The initial goals include, * improving the system in terms > >>>>>>> of > >>>>>>> scalability and efficiency, e.g., using Infiniband for network > >>>>>>> communication and multi-threading for one node computation. We > would > >>>>>>> consider extending SINGA to GPU clusters later. * benchmarking with > >>>>>>> larger datasets (hundreds of millions of training instances) and > >>>>>>> models (billions of parameters). * adding more built-in deep > learning > >>>>>>> models. Users can train the built-in models on their datasets > >>>>>>> directly. > >>>>>>> > >>>>>>> Current Status > >>>>>>> > >>>>>>> Meritocracy > >>>>>>> > >>>>>>> We would like to follow ASF meritocratic principles to encourage > more > >>>>>>> developers to contribute in this project. We know that only active > >>>>>>> and > >>>>>>> excellent developers can make SINGA a successful project. The > >>>>>>> committer list and PMC will be updated based on developers' > >>>>>>> performance and commitment. We are also improving the documentation > >>>>>>> and code to help new developers get started quickly. > >>>>>>> > >>>>>>> Community > >>>>>>> > >>>>>>> SINGA is currently being developed in the Database System Research > >>>>>>> Lab > >>>>>>> at the National University of Singapore (NUS) in collaboration with > >>>>>>> Zhejiang University in China. Our lab has extensive experience in > >>>>>>> building database related systems, including distributed systems. > Six > >>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng, > >>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research > >>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, > >>>>>>> Kian > >>>>>>> Lee Tan) have been working for a year on this project. We are open > to > >>>>>>> recruiting more developers from diverse backgrounds. > >>>>>>> > >>>>>>> Core Developers > >>>>>>> > >>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have > >>>>>>> worked on distributed systems for more than 20 years. They have > >>>>>>> collaborated with the industry and have built various large scale > >>>>>>> systems. Anh Dinh's research is also on distributed systems, albeit > >>>>>>> with more focus on security aspects. Wei Wang's research is on deep > >>>>>>> learning problems including deep learning applications and large > >>>>>>> scale > >>>>>>> training. Sheng Wang and Jinyang are working on efficient indexing, > >>>>>>> querying of large scale data and machine learning. Kaiping, > Zhaojing > >>>>>>> and Zhongle are new PhD students who jointed SINGA recently. They > >>>>>>> will > >>>>>>> work on this project for a longer time (next 4-5 years). While we > >>>>>>> share common research interests, each member also brings diverse > >>>>>>> expertise to the team. > >>>>>>> > >>>>>>> Alignment > >>>>>>> > >>>>>>> ASF is already the home of many distributed platforms, e.g., > Hadoop, > >>>>>>> Spark and Mahout, each of which targets a different application > >>>>>>> domain. SINGA, being a distributed platform for large-scale deep > >>>>>>> learning, focuses on another important domain for which there still > >>>>>>> lacks a robust and scalable open-source platform. The recent > success > >>>>>>> of deep learning models especially for vision and speech > recognition > >>>>>>> tasks has generated interests in both applying existing deep > learning > >>>>>>> models and in developing new ones. Thus, an open-source platform > for > >>>>>>> deep learning will be able to attract a large community of users > and > >>>>>>> developers. SINGA is a complex system needing many iterations of > >>>>>>> design, implementation and testing. Apache's collaboration > framework > >>>>>>> which encourages active contribution from developers will > inevitably > >>>>>>> help improve the quality of the system, as shown in the success of > >>>>>>> Hadoop, Spark, etc.. Equally important is the community of users > >>>>>>> which > >>>>>>> helps identify real-life applications of deep learning, and helps > to > >>>>>>> evaluate the system's performance and ease-of-use. We hope to > >>>>>>> leverage > >>>>>>> ASF for coordinating and promoting both communities, and in return > >>>>>>> benefit the communities with another useful tool. > >>>>>>> > >>>>>>> Known Risks > >>>>>>> > >>>>>>> Orphaned products > >>>>>>> > >>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may > >>>>>>> leave > >>>>>>> the lab in two to four years time. It is possible that some of them > >>>>>>> may not have enough time to focus on this project after that. But, > >>>>>>> SINGA is part of our other bigger research projects on building an > >>>>>>> infrastructure for data intensive applications, which include > >>>>>>> health-care analytics and brain-inspired computing. Beng Chin and > >>>>>>> Kian > >>>>>>> Lee would continue working on it and getting more people involved. > >>>>>>> For > >>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) > joined > >>>>>>> us recently. Individual developers are welcome to make SINGA a > >>>>>>> diverse > >>>>>>> community that is robust and independent from any single developer. > >>>>>>> > >>>>>>> Inexperience with Open Source > >>>>>>> > >>>>>>> All the developers are active users and followers of open source > >>>>>>> projects. Our research lab has a strong commitment to open source, > >>>>>>> and > >>>>>>> has released the source code of several systems under open source > >>>>>>> license as a way of contributing back to the open source community. > >>>>>>> But we do not have much real experience in open source projects > with > >>>>>>> large and well organized communities like those in Apache. This is > >>>>>>> one > >>>>>>> reason we choose Apache which is experienced in open source project > >>>>>>> incubation. We hope to get the help from Apache (e.g., champion and > >>>>>>> mentors) to establish a healthy path for SINGA. > >>>>>>> > >>>>>>> Homogenous Developers > >>>>>>> > >>>>>>> Although the current developers are researchers in the > universities, > >>>>>>> they have different research interests and project experiences, as > >>>>>>> mentioned in the section that introduces the core developers. We > know > >>>>>>> that a diverse community is helpful. Hence we are open to the idea > of > >>>>>>> recruiting developers from other regions and organizations. > >>>>>>> > >>>>>>> Reliance on Salaried Developers > >>>>>>> > >>>>>>> As a research project in the university, SINGA's current developing > >>>>>>> community consists of professors, PhD students, research assistants > >>>>>>> and postdoctoral fellows. They are driven by their interests to > work > >>>>>>> on this project and have contributed actively since the start of > the > >>>>>>> project. The research assistants and fellows are expected to leave > >>>>>>> when their contracts expire. However, they are keen to continue to > >>>>>>> work on the project voluntarily. Moreover, as a long term research > >>>>>>> project, new research assistants and fellows are likely to join the > >>>>>>> project. > >>>>>>> > >>>>>>> A Excessive Fascination with the Apache Brand > >>>>>>> > >>>>>>> We choose Apache not for publicity. We have two purposes. First, we > >>>>>>> want to leverage Apache's reputation to recruit more developers to > >>>>>>> make a diverse community. Second, we hope that Apache can help us > to > >>>>>>> establish a healthy path in developing SINGA. Beng Chin and > Kian-Lee > >>>>>>> are established database and distributed system researchers, and > >>>>>>> together with the other contributors, they sincerely believe that > >>>>>>> there is a need for a widely accepted open source distributed deep > >>>>>>> learning platform. The field of deep learning is still at its > >>>>>>> infancy, > >>>>>>> and an open source platform will fuel the research in the area. > >>>>>>> Moreover, such a platform will enable researchers to develop new > >>>>>>> models and algorithms, rather than spending time implementing a > deep > >>>>>>> learning system from scratch. Furthermore, the need for scalability > >>>>>>> for such a platform is obvious. > >>>>>>> > >>>>>>> Relationship with Other Apache Products > >>>>>>> > >>>>>>> Apache H2O implemented two simple deep learning models, namely the > >>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two > >>>>>>> significant differences between H2O and SINGA. First, H2O adopts > the > >>>>>>> Map-Reduce framework which runs a set of computing nodes in > parallel > >>>>>>> againsts of the training set. Model parameters trained by all > >>>>>>> computing nodes are averaged as the final model parameters. This > >>>>>>> training algorithm is different from the distributed training > >>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently > >>>>>>> synchronizes the parameters trained from different nodes. SINGA > >>>>>>> adopts > >>>>>>> the parameter server framework to support a wide range of > distributed > >>>>>>> training algorithms and parallelization methods (e.g., data > >>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only > >>>>>>> support data parallelism) . Second, in H2O, users are restricted to > >>>>>>> use the two built-in models. In SINGA, we provide simple > programming > >>>>>>> model to let users implement their own deep learning models. A new > >>>>>>> deep learning model can be implemented by customizing the base > Layer > >>>>>>> class for each layer involved in the model. It is similar to > writing > >>>>>>> Hadoop programs where users only need to override the base Mapper > and > >>>>>>> Reducer. We also provide built-in models for users to use directly. > >>>>>>> > >>>>>>> Documentation > >>>>>>> > >>>>>>> The project is hosted at > >>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html. > >>>>>>> Documentations can be found at the Github Wiki Page: > >>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and > >>>>>>> improve the documentation. > >>>>>>> > >>>>>>> Initial Source > >>>>>>> > >>>>>>> We use Github to maintain our source code, > >>> https://github.com/nusinga/singa > >>>>>>> > >>>>>>> Source and Intellectual Property Submission Plan > >>>>>>> > >>>>>>> We plan to make our code base be under Apache License, Version 2.0. > >>>>>>> > >>>>>>> External Dependencies > >>>>>>> > >>>>>>> required by the core code base: glog, gflags, google protobuf, > >>>>>>> open-blas, mpich, armci-mpi. > >>>>>>> required by data preparation and preprocessing: opencv, hdfs, > python. > >>>>>>> > >>>>>>> Cryptography > >>>>>>> > >>>>>>> Not Applicable > >>>>>>> > >>>>>>> Required Resources > >>>>>>> > >>>>>>> Mailing Lists > >>>>>>> > >>>>>>> Currently, we use google group for internal discussion. The mailing > >>>>>>> address is nusi...@googlegroup.com <javascript:;>. We will > migrate the content to > >>>>>>> the > >>>>>>> apache mailing lists in the future. > >>>>>>> > >>>>>>> singa-dev > >>>>>>> singa-user > >>>>>>> singa-commits > >>>>>>> singa-private (for private discussion within PCM) > >>>>>>> > >>>>>>> Git Repository > >>>>>>> > >>>>>>> We want to continue using git for version control. Hence, a git > repo > >>>>>>> is required. > >>>>>>> > >>>>>>> Issue Tracking > >>>>>>> > >>>>>>> JIRA Singa (SINGA) > >>>>>>> > >>>>>>> Initial Committers > >>>>>>> > >>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg) > >>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg) > >>>>>>> Gang Chen (cg @zju.edu.cn) > >>>>>>> Wei Wang (wangwei @comp.nus.edu.sg) > >>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg) > >>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg) > >>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg) > >>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg) > >>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg) > >>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg) > >>>>>>> > >>>>>>> Affiliations > >>>>>>> > >>>>>>> Beng Chin Ooi, National University of Singapore > >>>>>>> Kian Lee Tan, National University of Singapore > >>>>>>> Gang Chen, Zhejiang University > >>>>>>> Wei Wang, National University of Singapore > >>>>>>> Dinh Tien Tuan Anh, National University of Singapore > >>>>>>> Jinyang Gao, National University of Singapore > >>>>>>> Sheng Wang, National University of Singapore > >>>>>>> Kaiping Zheng, National University of Singapore > >>>>>>> Zhaojing Luo, National University of Singapore > >>>>>>> Zhongle Xie, National University of Singapore > >>>>>>> > >>>>>>> Sponsors > >>>>>>> > >>>>>>> Champion > >>>>>>> > >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks > >>>>>>> > >>>>>>> Nominated Mentors > >>>>>>> > >>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks > >>>>>>> Alan Gates (gates at apache dot org) - Hortonworks > >>>>>>> (Seeking more volunteers!) > >>>>>>> > >>>>>>> Sponsoring Entity > >>>>>>> > >>>>>>> We are requesting the Incubator to sponsor this project. > >>>>>>> > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > <javascript:;> > >>>>>>> For additional commands, e-mail: general-h...@incubator.apache.org > <javascript:;> > >>>>>>> > >>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > <javascript:;> > >>>>>> For additional commands, e-mail: general-h...@incubator.apache.org > <javascript:;> > >>>>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > <javascript:;> > >>>> For additional commands, e-mail: general-h...@incubator.apache.org > <javascript:;> > >>>> > >>> > >>> > >>> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > <javascript:;> > > For additional commands, e-mail: general-h...@incubator.apache.org > <javascript:;> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > <javascript:;> > For additional commands, e-mail: general-h...@incubator.apache.org > <javascript:;> > > -- Sent from My iPad, sorry for any misspellings.