+1, obviously! Otis
----- Original Message ---- > From: Jörn Kottmann <kottm...@gmail.com> > To: general@incubator.apache.org > Sent: Fri, November 19, 2010 4:48:39 AM > Subject: [VOTE] Accept OpenNLP for incubation > > Hi, > > lets vote on the acceptance of the OpenNLP Project for incubation > at the Apache Incubator. > > The proposal is on the wiki > http://wiki.apache.org/incubator/OpenNLPProposal > and a copy is included below. > > The discussion thread can be found here: >http://mail-archives.apache.org/mod_mbox/incubator-general/201011.mbox/%3c4ce4f1f4.3010...@gmail.com%3e >E > > Please cast your votes: > > [ ] +1 Accept OpenNLP for incubation > [ ] +0 Don't care > [ ] -1 Reject for the following reason: > > The vote is open for at least 72 hours. > > Thanks! > Jörn > > = OpenNLP Proposal = > The following is a proposal for a new top-level project within the ASF. > > == Abstract == > OpenNLP is a Java machine learning toolkit for natural language processing >(NLP). > > == Proposal == > OpenNLP is a machine learning based toolkit for the processing of natural >language text. It supports the most common NLP tasks, such as tokenization, >sentence segmentation, part-of-speech tagging, named entity extraction, >chunking, parsing, and coreference resolution. These tasks are usually >required to build more advanced text processing services. > > The goal of the OpenNLP project will be to create a mature toolkit for the >abovementioned tasks. An additional goal is to provide a large number of >pre-built models for a variety of languages, as well as the annotated text >resources that those models are derived from. > > == Background == > OpenNLP was started in 2000 by Jason Baldridge and Gann Bierner while they >were graduate students in the Division of Informatics at the University of >Edinburgh. OpenNLP, broadly speaking, was meant to be a high-level >organizational unit for various open source software packages for natural >language processing; more practically, it provided a high-level package name >for >various Java packages of the form opennlp.*. The first OpenNLP software >package >was the Grok natural language parsing toolkit, which was also the genesis of >what is now called the OpenNLP Toolkit. The software released on the OpenNLP >sourceforge site (started in 2000, along with Grok) was simply a set of >interfaces defined in the package opennlp.common and referred to as the >OpenNLP >Java API. The actual implementations of natural language processing components > >were provided in Grok, along with code for sentence parsing with Combinatory >Categorial Grammar. This code was used heavily in both Baldridge's and Biern > er's dissertations. The first paper that used Grok, and especially the >components that would become the OpenNLP Toolkit is >[[http://comp.ling.utexas.edu/jbaldrid/papers/hockenmaier_etal_ESSLLI2000.pdf|Hockenmaier, > Bierner and Baldridge (2000)]] (later updated as the journal article >[[http://comp.ling.utexas.edu/jbaldrid/papers/HockenmaierEtal2004.pdf|Hockenmaier, > Bierner, and Baldridge (2004)]]). > > In 2003, it was decided to remove the NLP infrastructure from Grok as there >was a clear separation between the basic text processing components and the >syntactic and semantic analysis components. At the same time, Grok was >rebranded as OpenCCG (openccg.sf.net). The final release of the OpenNLP Java >API was made in March 2003; the new OpenNLP Toolkit was created from the API >and the Grok text processing components, with version 1.0 being released in >April 2004. The OpenNLP Toolkit and OpenCCG have evolved independently since >then and have mostly independent and active developer and user communities. >OpenCCG is primarily used in the academic community, while OpenNLP has >considerable use in both academia and industry. As in indication of the >academic impact of OpenNLP, a search on Google scholar (done in March 2010) >returned about 650 publications citing the package. Some of these include the >OpenNLP website and a few non-publications plus some self-citations. Based on >a >scan of > these results, we estimate that about 500 actual publications have used >OpenNLP in their work, and there are an addition 50 or so quasi-publications >like surveys and instruction manuals. > > The activity level of the OpenNLP project has fluctuated over that past 10+ >years, with a large uptick in the last two years especially. Most recently, >due >both to the availability of new documentation and the release of version 1.5 >, >there have been many more downloads and page views for the OpenNLP project. >In >fact, September 2010 had the most downloads (1,561) and project web hits >(226,391) of any month since the project's beginning in 2000, and October is >keeping pacing with that figure so far. As a result, OpenNLP has gone from >being in the 2000th to 4000th ranked project (between January and May, 2010) >to >being ranked 570, 314, 181 and 439 for July, August, September, and October >respectively. Full details are available on the Sourceforge statistics page >for >OpenNLP. (There are 240,000 projects hosted on SourceForge, though this >figure >includes many, many projects that never actually get started: it seems that >about 7-10% of these are stable, active projects base > d on a review done in 2007.) > > == Rationale == > OpenNLP fills a significant gap at the ASF in regards to human language >processing tools. While Lucene/Solr, UIMA and Mahout all have some tools in >this area, none of them are solely focused on tools specifically for working >with natural language like OpenNLP. > > == Initial Goals == > The initial goals of the proposed project are: > > * Bring the community together at the ASF and make the development process >transparent for them > * Write user documentation about all major components > * Automated build including train and evaluate regression tests > * Produce an Incubating release > > == Current Status == > === Meritocracy === > Some of the initial committers are familiar with Apache's idea of > meritocracy, >others aren't. We will get everybody on the same level as part of the >incubation process. > > === Community === > OpenNLP already has a considerable user base, both in industry and academia. > > === Core Developers === > See the initial committer list. > > === Alignment === > OpenNLP has tie-ins with several existing Apache projects. We have been >distributing wrappers for UIMA for some time now (two UIMA committers also >contribute to OpenNLP). We expect this collaboration to strengthen further >after our move to Apache. > > Another obvious connection exists to some of the projects under the Lucene >umbrella. On the one hand, projects like Solr may benefit from the OpenNLP >analysis capabilities to create specialized search for particular domains. >On >the other, OpenNLP may benefit from the machine learning code that is being >developed in Mahout, and maybe get some people from that community to lend a >hand. > > == Known Risks == > === Orphaned products === > The project has been around for quite a number of years already, it has a >well-established user community and a diverse set of committers. > > === Inexperience with Open Source === > OpenNLP has been an open source project for quite some time. Many of the >developers are already familiar with both open source in general and the ASF >in >particular. > > === Homogenous Developers === > The current group of developers is very diverse, no two developers work for >the same organization. > > === Reliance on Salaried Developers === > Most of the developers are not paid to work on OpenNLP, so there is little >reliance on salaried developers. > > === Relationships with Other Apache Products === > NLP is often used in search and other algorithms that work with unstructured >data, thus OpenNLP is likely to be useful to the Lucene and Solr communities. > >It also aligns nicely with both Mahout and UIMA. > > === A Excessive Fascination with the Apache Brand === > We think the project aligns nicely with the goals of the ASF to disseminate >source code to the public free of charge. NLP has long been the subject of >cutting edge research, but is often lacking in community and shared >knowledge. >We believe that by bringing OpenNLP to the ASF, the Apache brand will help >deliver NLP capabilities to a much larger audience and likewise a cutting >edge >project like OpenNLP can further the ASF brand by providing users with tried >and true, as well as new, natural language processing capabilities. > > == Documentation == > *http://opennlp.sourceforge.net/README.html > *http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Main_Page > > == Initial Source == > The source code is maintained in two CVS repositories on SourceForge. > > OpenNLP Maxent:http://maxent.cvs.sourceforge.net/viewvc/maxent/ > > OpenNLP Tools and OpenNLP >UIMA:http://opennlp.cvs.sourceforge.net/viewvc/opennlp/ > > == Source and Intellectual Property Submission Plan == > The OpenNLP source code is already open source under the AL 2.0. > > == External Dependencies == > ||'''Library''' ||||<style="text-align: center;">'''License''' >||||<style="text-align: center;">'''Description''' || > ||JWNL ||||<style="text-align: center;">BSD ||||<style="text-align: >center;">Java Wordnet Library || > ||JUnit ||||<style="text-align: center;">CPL ||||<style="text-align: >center;">Unit Testing Framework || > ||UIMA ||||<style="text-align: center;">AL 2.0 ||||<style="text-align: >center;">Unstructured Information Management Architecture || > > > == Cryptography == > OpenNLP neither provides nor uses any cryptography. > > == Required Resources == > === Mailing lists === > * opennlp-dev > * opennlp-private > * opennlp-user > * opennlp-commits > > === Subversion Directory === > https://svn.apache.org/repos/asf/incubator/opennlp > > === Issue Tracking === > Jira: OPENNLP > > === Other Resources === > == Initial Committers == > ||'''Name''' ||||<style="text-align: center;">'''Email''' >||||<style="text-align: center;">'''CLA''' || > ||Thilo Goetz ||||<style="text-align: center;"> twgo...@apache.org >||||<style="text-align: center;">yes || > ||Grant Ingersoll ||||<style="text-align: center;"> gsing...@apache.org >||||<style="text-align: center;">yes || > ||Jörn Kottmann ||||<style="text-align: center;"> jo...@apache.org >||||<style="text-align: center;">yes || > ||Thomas Morton ||||<style="text-align: center;"> tsmor...@gmail.com >||||<style="text-align: center;">no || > ||William Silva ||||<style="text-align: center;"> william.co...@gmail.com >||||<style="text-align: center;">yes || > ||Jason Baldridge ||||<style="text-align: center;"> > jasonbaldri...@gmail.com >||||<style="text-align: center;">yes || > ||James Kosin ||||<style="text-align: center;"> james.ko...@gmail.com >||||<style="text-align: center;">yes || > > > == Affiliations == > ||'''Name''' ||||<style="text-align: center;">'''Affiliation''' || > ||Thilo Goetz ||||<style="text-align: center;">IBM || > ||Grant Ingersoll ||||<style="text-align: center;">Lucid Imagination || > ||Jörn Kottmann ||||<style="text-align: center;">Infopaq International A/S || > ||Thomas Morton ||||<style="text-align: center;">Comcast Corporation || > ||William Silva ||||<style="text-align: center;">São Paulo University || > ||Jason Baldridge ||||<style="text-align: center;">The University of Texas > at >Austin || > ||James Kosin ||||<style="text-align: center;">International Communications >Group, Inc. || > > > == Sponsors == > === Champion === > Grant Ingersoll > > === Nominated Mentors === > Isabel Drost > > Grant Ingersoll > > Benson Margulies > > > > === Sponsoring Entity === > The Apache Incubator > > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org