woot is right. Lewis, can you create the initial bootstrap issue for the podling per the creation guide and then send to the team?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Henry Saputra <henry.sapu...@gmail.com> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> Date: Friday, February 12, 2016 at 9:13 PM To: "general@incubator.apache.org" <general@incubator.apache.org> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> Subject: Re: [RESULT] [VOTE] Accept Joshua as an Apache Incubator Podling >W00t! > >On Fri, Feb 12, 2016 at 3:55 PM, Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> All, >> >> Thank you for VOTE’ing! This VOTE has PASSED with the following >> tallies: >> >> +1 >> Chris Mattmann* >> Henry Saputra* >> Tom Barber* >> Luke Han >> Hen Yandell >> Ashish >> Tommaso Teofili* >> Jean-Baptiste Onofre* >> Jim Jagielski >> Chris Douglas >> Seetharam Venkatesh* >> Lewis John McGibbney* >> Danese Cooper* >> >> Thanks to everyone for VOTE’ing. I’ll start bootstrapping the >> podling with help from Lewis. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> -----Original Message----- >> From: jpluser <chris.a.mattm...@jpl.nasa.gov> >> Date: Saturday, January 30, 2016 at 12:00 PM >> To: "general@incubator.apache.org" <general@incubator.apache.org> >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> >> Subject: [VOTE] Accept Joshua as an Apache Incubator Podling >> >> >Hi Everyone, >> > >> >OK the discussion is now completed. Please VOTE to accept Joshua >> >into the Apache Incubator. I’ll leave the VOTE open for at least >> >the next 72 hours, with hopes to close it next Friday the 5th of >> >February, 2016. >> > >> >[ ] +1 Accept Joshua as an Apache Incubator podling. >> >[ ] +0 Abstain. >> >[ ] -1 Don’t accept Joshua as an Apache Incubator podling because.. >> > >> >Of course, I am +1 on this. Please note VOTEs from Incubator PMC >> >members are binding but all are welcome to VOTE! >> > >> >Cheers, >> >Chris >> > >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >Chris Mattmann, Ph.D. >> >Chief Architect >> >Instrument Software and Science Data Systems Section (398) >> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >Office: 168-519, Mailstop: 168-527 >> >Email: chris.a.mattm...@nasa.gov >> >WWW: http://sunset.usc.edu/~mattmann/ >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >Adjunct Associate Professor, Computer Science Department >> >University of Southern California, Los Angeles, CA 90089 USA >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > >> > >> > >> > >> > >> >-----Original Message----- >> >From: jpluser <chris.a.mattm...@jpl.nasa.gov> >> >Date: Tuesday, January 12, 2016 at 10:56 PM >> >To: "general@incubator.apache.org" <general@incubator.apache.org> >> >Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> >> >Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine >>Translation >> >Toolkit >> > >> >>Hi Everyone, >> >> >> >>Please find attached for your viewing pleasure a proposed new project, >> >>Apache Joshua, a statistical machine translation toolkit. The proposal >> >>is in wiki draft form at: >> >>https://wiki.apache.org/incubator/JoshuaProposal >> >> >> >>Proposal text is copied below. I’ll leave the discussion open for a >>week >> >>and we are interested in folks who would like to be initial committers >> >>and mentors. Please discuss here on the thread. >> >> >> >>Thanks! >> >> >> >>Cheers, >> >>Chris (Champion) >> >> >> >>——— >> >> >> >>= Joshua Proposal = >> >> >> >>== Abstract == >> >>[[joshua-decoder.org|Joshua]] is an open-source statistical machine >> >>translation toolkit. It includes a Java-based decoder for translating >> >>with >> >>phrase-based, hierarchical, and syntax-based translation models, a >> >>Hadoop-based grammar extractor (Thrax), and an extensive set of tools >>and >> >>scripts for training and evaluating new models from parallel text. >> >> >> >>== Proposal == >> >>Joshua is a state of the art statistical machine translation system >>that >> >>provides a number of features: >> >> >> >> * Support for the two main paradigms in statistical machine >>translation: >> >>phrase-based and hierarchical / syntactic. >> >> * A sparse feature API that makes it easy to add new feature >>templates >> >>supporting millions of features >> >> * Native implementations of many tuners (MERT, MIRA, PRO, and >>AdaGrad) >> >> * Support for lattice decoding, allowing upstream NLP tools to expose >> >>their hypothesis space to the MT system >> >> * An efficient representation for models, allowing for quick loading >>of >> >>multi-gigabyte model files >> >> * Fast decoding speed (on par with Moses and mtplz) >> >> * Language packs — precompiled models that allow the decoder to be >>run >> >>as >> >>a black box >> >> * Thrax, a Hadoop-based tool for learning translation models from >> >>parallel text >> >> * A suite of tools for constructing new models for any language pair >>for >> >>which sufficient training data exists >> >> >> >>== Background and Rationale == >> >>A number of factors make this a good time for an Apache project >>focused >> >>on >> >>machine translation (MT): the quality of MT output (for many language >> >>pairs); the average computing resources available on computers, >>relative >> >>to the needs of MT systems; and the availability of a number of >> >>high-quality toolkits, together with a large base of researchers >>working >> >>on them. >> >> >> >>Over the past decade, machine translation (MT; the automatic >>translation >> >>of one human language to another) has become a reality. The research >>into >> >>statistical approaches to translation that began in the early >>nineties, >> >>together with the availability of large amounts of training data, and >> >>better computing infrastructure, have all come together to produce >> >>translations results that are “good enough” for a large set of >>language >> >>pairs and use cases. Free services like >> >>[[https://www.bing.com/translator|Bing Translator]] and >> >>[[https://translate.google.com|Google Translate]] have made these >> >>services >> >>available to the average person through direct interfaces and through >> >>tools like browser plugins, and sites across the world with higher >> >>translation needs use them to translate their pages through >> >>automatically. >> >> >> >>MT does not require the infrastructure of large corporations in order >>to >> >>produce feasible output. Machine translation can be >>resource-intensive, >> >>but need not be prohibitively so. Disk and memory usage are mostly a >> >>matter of model size, which for most language pairs is a few >>gigabytes at >> >>most, at which size models can provide coverage on the order of tens >>or >> >>even hundreds of thousands of words in the input and output languages. >> >>The >> >>computational complexity of the algorithms used to search for >> >>translations >> >>of new sentences are typically linear in the number of words in the >>input >> >>sentence, making it possible to run a translation engine on a personal >> >>computer. >> >> >> >>The research community has produced many different open source >> >>translation >> >>projects for a range of programming languages and under a variety of >> >>licenses. These projects include the core “decoder”, which takes a >>model >> >>and uses it to translate new sentences between the language pair the >> >>model >> >>was defined for. They also typically include a large set of tools that >> >>enable new models to be built from large sets of example translations >> >>(“parallel data”) and monolingual texts. These toolkits are usually >>built >> >>to support the agendas of the (largely) academic researchers that >>build >> >>them: the repeated cycle of building new models, tuning model >>parameters >> >>against development data, and evaluating them against held-out test >>data, >> >>using standard metrics for testing the quality of MT output. >> >> >> >>Together, these three factors—the quality of machine translation >>output, >> >>the feasibility of translating on standard computers, and the >> >>availability >> >>of tools to build models—make it reasonable for the end users to use >>MT >> >>as >> >>a black-box service, and to run it on their personal machine. >> >> >> >>These factors make it a good time for an organization with the status >>of >> >>the Apache Foundation to host a machine translation project. >> >> >> >>== Current Status == >> >>Joshua was originally ported from David Chiang’s Python >>implementation of >> >>Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins >> >>University. The current version is maintained by Matt Post at Johns >> >>Hopkins’ Human Language Technology Center of Excellence. Joshua has >>made >> >>many releases with a list of over 20 source code tags. The last >>release >> >>of >> >>Joshua was 6.0.5 on November 5th, 2015. >> >> >> >>== Meritocracy == >> >>The current developers are familiar with meritocratic open source >> >>development at Apache. Apache was chosen specifically because we want >>to >> >>encourage this style of development for the project. >> >> >> >>== Community == >> >>Joshua is used widely across the world. Perhaps its biggest (known) >> >>research / industrial user is the Amazon research group in Berlin. >> >>Another >> >>user is the US Army Research Lab. No formal census has been >>undertaken, >> >>but posts to the Joshua technical support mailing list, along with the >> >>occasional contributions, suggest small research and academic >>communities >> >>spread across the world, many of them in India. >> >> >> >>During incubation, we will explicitly seek to increase our usage >>across >> >>the board, including academic research, industry, and other end users >> >>interested in statistical machine translation. >> >> >> >>== Core Developers == >> >>The current set of core developers is fairly small, having fallen with >> >>the >> >>graduation from Johns Hopkins of some core student participants. >>However, >> >>Joshua is used fairly widely, as mentioned above, and there remains a >> >>commitment from the principal researcher at Johns Hopkins to continue >>to >> >>use and develop it. Joshua has seen a number of new community members >> >>become interested recently due to a potential for its projected use >>in a >> >>number of ongoing DARPA projects such as XDATA and Memex. >> >> >> >>== Alignment == >> >>Joshua is currently Copyright (c) 2015, Johns Hopkins University All >> >>rights reserved and licensed under BSD 2-clause license. It would of >> >>course be the intention to relicense this code under AL2.0 which would >> >>permit expanded and increased use of the software within Apache >>projects. >> >>There is currently an ongoing effort within the Apache Tika community >>to >> >>utilize Joshua within Tika’s Translate API, see >> >>[[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]]. >> >> >> >>== Known Risks == >> >> >> >>=== Orphaned products === >> >>At the moment, regular contributions are made by a single contributor, >> >>the >> >>lead maintainer. He (Matt Post) plans to continue development for the >> >>next >> >>few years, but it is still a single point of failure, since the >>graduate >> >>students who worked on the project have moved on to jobs, mostly in >> >>industry. However, our goal is to help that process by growing the >> >>community in Apache, and at least in growing the community with users >>and >> >>participants from NASA JPL. >> >> >> >>=== Inexperience with Open Source === >> >>The team both at Johns Hopkins and NASA JPL have experience with many >>OSS >> >>software projects at Apache and elsewhere. We understand "how it >>works" >> >>here at the foundation. >> >> >> >> >> >>== Relationships with Other Apache Products == >> >>Joshua includes dependences on Hadoop, and also is included as a >>plugin >> >>in >> >>Apache Tika. We are also interested in coordinating with other >>projects >> >>including Spark, and other projects needing MT services for language >> >>translation. >> >> >> >>== Developers == >> >>Joshua only has one regular developer who is employed by Johns Hopkins >> >>University. NASA JPL (Mattmann and McGibbney) have been contributing >> >>lately including a Brew formula and other contributions to the project >> >>through the DARPA XDATA and Memex programs. >> >> >> >>== Documentation == >> >>Documentation and publications related to Joshua can be found at >> >>joshua-decoder.org. The source for the Joshua documentation is >>currently >> >>hosted on Github at >> >>https://github.com/joshua-decoder/joshua-decoder.github.com >> >> >> >>== Initial Source == >> >>Current source resides at Github: github.com/joshua-decoder/joshua >>(the >> >>main decoder and toolkit) and github.com/joshua-decoder/thrax (the >> >>grammar >> >>extraction tool). >> >> >> >>== External Dependencies == >> >>Joshua has a number of external dependencies. Only BerkeleyLM (Apache >> >>2.0) >> >>and KenLM (LGPG 2.1) are run-time decoder dependencies (one of which >>is >> >>needed for translating sentences with pre-built models). The rest are >> >>dependencies for the build system and pipeline, used for constructing >>and >> >>training new models from parallel text. >> >> >> >>Apache projects: >> >> * Ant >> >> * Hadoop >> >> * Commons >> >> * Maven >> >> * Ivy >> >> >> >>There are also a number of other open-source projects with various >> >>licenses that the project depends on both dynamically (runtime), and >> >>statically. >> >> >> >>=== GNU GPL 2 === >> >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/ >> >> >> >>=== LGPG 2.1 === >> >> * KenLM: github.com/kpu/kenlm >> >> >> >>=== Apache 2.0 === >> >> * BerkeleyLM: https://code.google.com/p/berkeleylm/ >> >> >> >>=== GNU GPL === >> >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html >> >> >> >>== Required Resources == >> >> * Mailing Lists >> >> * priv...@joshua.incubator.apache.org >> >> * d...@joshua.incubator.apache.org >> >> * comm...@joshua.incubator.apache.org >> >> >> >> * Git Repos >> >> * https://git-wip-us.apache.org/repos/asf/joshua.git >> >> >> >> * Issue Tracking >> >> * JIRA Joshua (JOSHUA) >> >> >> >> * Continuous Integration >> >> * Jenkins builds on https://builds.apache.org/ >> >> >> >> * Web >> >> * http://joshua.incubator.apache.org/ >> >> * wiki at http://cwiki.apache.org >> >> >> >>== Initial Committers == >> >>The following is a list of the planned initial Apache committers (the >> >>active subset of the committers for the current repository on Github). >> >> >> >> * Matt Post (p...@cs.jhu.edu) >> >> * Lewis John McGibbney (lewi...@apache.org) >> >> * Chris Mattmann (mattm...@apache.org) >> >> >> >>== Affiliations == >> >> >> >> * Johns Hopkins University >> >> * Matt Post >> >> >> >> * NASA JPL >> >> * Chris Mattmann >> >> * Lewis John McGibbney >> >> >> >> >> >>== Sponsors == >> >>=== Champion === >> >> * Chris Mattmann (NASA/JPL) >> >> >> >>=== Nominated Mentors === >> >> * Paul Ramirez >> >> * Lewis John McGibbney >> >> * Chris Mattmann >> >> >> >>== Sponsoring Entity == >> >>The Apache Incubator >> >> >> >> >> >> >> >> >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>Chris Mattmann, Ph.D. >> >>Chief Architect >> >>Instrument Software and Science Data Systems Section (398) >> >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>Office: 168-519, Mailstop: 168-527 >> >>Email: chris.a.mattm...@nasa.gov >> >>WWW: http://sunset.usc.edu/~mattmann/ >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>Adjunct Associate Professor, Computer Science Department >> >>University of Southern California, Los Angeles, CA 90089 USA >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> > >> >>