woot is right.

Lewis, can you create the initial bootstrap issue for the podling
per the creation guide and then send to the team?

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Henry Saputra <henry.sapu...@gmail.com>
Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
Date: Friday, February 12, 2016 at 9:13 PM
To: "general@incubator.apache.org" <general@incubator.apache.org>
Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
Subject: Re: [RESULT] [VOTE] Accept Joshua as an Apache Incubator Podling

>W00t!
>
>On Fri, Feb 12, 2016 at 3:55 PM, Mattmann, Chris A (3980) <
>chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> All,
>>
>> Thank you for VOTE’ing! This VOTE has PASSED with the following
>> tallies:
>>
>> +1
>> Chris Mattmann*
>> Henry Saputra*
>> Tom Barber*
>> Luke Han
>> Hen Yandell
>> Ashish
>> Tommaso Teofili*
>> Jean-Baptiste Onofre*
>> Jim Jagielski
>> Chris Douglas
>> Seetharam Venkatesh*
>> Lewis John McGibbney*
>> Danese Cooper*
>>
>> Thanks to everyone for VOTE’ing. I’ll start bootstrapping the
>> podling with help from Lewis.
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: jpluser <chris.a.mattm...@jpl.nasa.gov>
>> Date: Saturday, January 30, 2016 at 12:00 PM
>> To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
>> Subject: [VOTE] Accept Joshua as an Apache Incubator Podling
>>
>> >Hi Everyone,
>> >
>> >OK the discussion is now completed. Please VOTE to accept Joshua
>> >into the Apache Incubator. I’ll leave the VOTE open for at least
>> >the next 72 hours, with hopes to close it next Friday the 5th of
>> >February, 2016.
>> >
>> >[ ] +1 Accept Joshua as an Apache Incubator podling.
>> >[ ] +0 Abstain.
>> >[ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
>> >
>> >Of course, I am +1 on this. Please note VOTEs from Incubator PMC
>> >members are binding but all are welcome to VOTE!
>> >
>> >Cheers,
>> >Chris
>> >
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Chris Mattmann, Ph.D.
>> >Chief Architect
>> >Instrument Software and Science Data Systems Section (398)
>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >Office: 168-519, Mailstop: 168-527
>> >Email: chris.a.mattm...@nasa.gov
>> >WWW:  http://sunset.usc.edu/~mattmann/
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Adjunct Associate Professor, Computer Science Department
>> >University of Southern California, Los Angeles, CA 90089 USA
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> >
>> >
>> >
>> >
>> >-----Original Message-----
>> >From: jpluser <chris.a.mattm...@jpl.nasa.gov>
>> >Date: Tuesday, January 12, 2016 at 10:56 PM
>> >To: "general@incubator.apache.org" <general@incubator.apache.org>
>> >Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu>
>> >Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
>>Translation
>> >Toolkit
>> >
>> >>Hi Everyone,
>> >>
>> >>Please find attached for your viewing pleasure a proposed new project,
>> >>Apache Joshua, a statistical machine translation toolkit. The proposal
>> >>is in wiki draft form at:
>> >>https://wiki.apache.org/incubator/JoshuaProposal
>> >>
>> >>Proposal text is copied below. I’ll leave the discussion open for a
>>week
>> >>and we are interested in folks who would like to be initial committers
>> >>and mentors. Please discuss here on the thread.
>> >>
>> >>Thanks!
>> >>
>> >>Cheers,
>> >>Chris (Champion)
>> >>
>> >>———
>> >>
>> >>= Joshua Proposal =
>> >>
>> >>== Abstract ==
>> >>[[joshua-decoder.org|Joshua]] is an open-source statistical machine
>> >>translation toolkit. It includes a Java-based decoder for translating
>> >>with
>> >>phrase-based, hierarchical, and syntax-based translation models, a
>> >>Hadoop-based grammar extractor (Thrax), and an extensive set of tools
>>and
>> >>scripts for training and evaluating new models from parallel text.
>> >>
>> >>== Proposal ==
>> >>Joshua is a state of the art statistical machine translation system
>>that
>> >>provides a number of features:
>> >>
>> >> * Support for the two main paradigms in statistical machine
>>translation:
>> >>phrase-based and hierarchical / syntactic.
>> >> * A sparse feature API that makes it easy to add new feature
>>templates
>> >>supporting millions of features
>> >> * Native implementations of many tuners (MERT, MIRA, PRO, and
>>AdaGrad)
>> >> * Support for lattice decoding, allowing upstream NLP tools to expose
>> >>their hypothesis space to the MT system
>> >> * An efficient representation for models, allowing for quick loading
>>of
>> >>multi-gigabyte model files
>> >> * Fast decoding speed (on par with Moses and mtplz)
>> >> * Language packs — precompiled models that allow the decoder to be
>>run
>> >>as
>> >>a black box
>> >> * Thrax, a Hadoop-based tool for learning translation models from
>> >>parallel text
>> >> * A suite of tools for constructing new models for any language pair
>>for
>> >>which sufficient training data exists
>> >>
>> >>== Background and Rationale ==
>> >>A number of factors make this a good time for an Apache project
>>focused
>> >>on
>> >>machine translation (MT): the quality of MT output (for many language
>> >>pairs); the average computing resources available on computers,
>>relative
>> >>to the needs of MT systems; and the availability of a number of
>> >>high-quality toolkits, together with a large base of researchers
>>working
>> >>on them.
>> >>
>> >>Over the past decade, machine translation (MT; the automatic
>>translation
>> >>of one human language to another) has become a reality. The research
>>into
>> >>statistical approaches to translation that began in the early
>>nineties,
>> >>together with the availability of large amounts of training data, and
>> >>better computing infrastructure, have all come together to produce
>> >>translations results that are “good enough” for a large set of
>>language
>> >>pairs and use cases. Free services like
>> >>[[https://www.bing.com/translator|Bing Translator]] and
>> >>[[https://translate.google.com|Google Translate]] have made these
>> >>services
>> >>available to the average person through direct interfaces and through
>> >>tools like browser plugins, and sites across the world with higher
>> >>translation needs use them to translate their pages through
>> >>automatically.
>> >>
>> >>MT does not require the infrastructure of large corporations in order
>>to
>> >>produce feasible output. Machine translation can be
>>resource-intensive,
>> >>but need not be prohibitively so. Disk and memory usage are mostly a
>> >>matter of model size, which for most language pairs is a few
>>gigabytes at
>> >>most, at which size models can provide coverage on the order of tens
>>or
>> >>even hundreds of thousands of words in the input and output languages.
>> >>The
>> >>computational complexity of the algorithms used to search for
>> >>translations
>> >>of new sentences are typically linear in the number of words in the
>>input
>> >>sentence, making it possible to run a translation engine on a personal
>> >>computer.
>> >>
>> >>The research community has produced many different open source
>> >>translation
>> >>projects for a range of programming languages and under a variety of
>> >>licenses. These projects include the core “decoder”, which takes a
>>model
>> >>and uses it to translate new sentences between the language pair the
>> >>model
>> >>was defined for. They also typically include a large set of tools that
>> >>enable new models to be built from large sets of example translations
>> >>(“parallel data”) and monolingual texts. These toolkits are usually
>>built
>> >>to support the agendas of the (largely) academic researchers that
>>build
>> >>them: the repeated cycle of building new models, tuning model
>>parameters
>> >>against development data, and evaluating them against held-out test
>>data,
>> >>using standard metrics for testing the quality of MT output.
>> >>
>> >>Together, these three factors—the quality of machine translation
>>output,
>> >>the feasibility of translating on standard computers, and the
>> >>availability
>> >>of tools to build models—make it reasonable for the end users to use
>>MT
>> >>as
>> >>a black-box service, and to run it on their personal machine.
>> >>
>> >>These factors make it a good time for an organization with the status
>>of
>> >>the Apache Foundation to host a machine translation project.
>> >>
>> >>== Current Status ==
>> >>Joshua was originally ported from David Chiang’s Python
>>implementation of
>> >>Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins
>> >>University. The current version is maintained by Matt Post at Johns
>> >>Hopkins’ Human Language Technology Center of Excellence. Joshua has
>>made
>> >>many releases with a list of over 20 source code tags. The last
>>release
>> >>of
>> >>Joshua was 6.0.5 on November 5th, 2015.
>> >>
>> >>== Meritocracy ==
>> >>The current developers are familiar with meritocratic open source
>> >>development at Apache. Apache was chosen specifically because we want
>>to
>> >>encourage this style of development for the project.
>> >>
>> >>== Community ==
>> >>Joshua is used widely across the world. Perhaps its biggest (known)
>> >>research / industrial user is the Amazon research group in Berlin.
>> >>Another
>> >>user is the US Army Research Lab. No formal census has been
>>undertaken,
>> >>but posts to the Joshua technical support mailing list, along with the
>> >>occasional contributions, suggest small research and academic
>>communities
>> >>spread across the world, many of them in India.
>> >>
>> >>During incubation, we will explicitly seek to increase our usage
>>across
>> >>the board, including academic research, industry, and other end users
>> >>interested in statistical machine translation.
>> >>
>> >>== Core Developers ==
>> >>The current set of core developers is fairly small, having fallen with
>> >>the
>> >>graduation from Johns Hopkins of some core student participants.
>>However,
>> >>Joshua is used fairly widely, as mentioned above, and there remains a
>> >>commitment from the principal researcher at Johns Hopkins to continue
>>to
>> >>use and develop it. Joshua has seen a number of new community members
>> >>become interested recently due to a potential for its projected use
>>in a
>> >>number of ongoing DARPA projects such as XDATA and Memex.
>> >>
>> >>== Alignment ==
>> >>Joshua is currently Copyright (c) 2015, Johns Hopkins University All
>> >>rights reserved and licensed under BSD 2-clause license. It would of
>> >>course be the intention to relicense this code under AL2.0 which would
>> >>permit expanded and increased use of the software within Apache
>>projects.
>> >>There is currently an ongoing effort within the Apache Tika community
>>to
>> >>utilize Joshua within Tika’s Translate API, see
>> >>[[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]].
>> >>
>> >>== Known Risks ==
>> >>
>> >>=== Orphaned products ===
>> >>At the moment, regular contributions are made by a single contributor,
>> >>the
>> >>lead maintainer. He (Matt Post) plans to continue development for the
>> >>next
>> >>few years, but it is still a single point of failure, since the
>>graduate
>> >>students who worked on the project have moved on to jobs, mostly in
>> >>industry. However, our goal is to help that process by growing the
>> >>community in Apache, and at least in growing the community with users
>>and
>> >>participants from NASA JPL.
>> >>
>> >>=== Inexperience with Open Source ===
>> >>The team both at Johns Hopkins and NASA JPL have experience with many
>>OSS
>> >>software projects at Apache and elsewhere. We understand "how it
>>works"
>> >>here at the foundation.
>> >>
>> >>
>> >>== Relationships with Other Apache Products ==
>> >>Joshua includes dependences on Hadoop, and also is included as a
>>plugin
>> >>in
>> >>Apache Tika. We are also interested in coordinating with other
>>projects
>> >>including Spark, and other projects needing MT services for language
>> >>translation.
>> >>
>> >>== Developers ==
>> >>Joshua only has one regular developer who is employed by Johns Hopkins
>> >>University. NASA JPL (Mattmann and McGibbney) have been contributing
>> >>lately including a Brew formula and other contributions to the project
>> >>through the DARPA XDATA and Memex programs.
>> >>
>> >>== Documentation ==
>> >>Documentation and publications related to Joshua can be found at
>> >>joshua-decoder.org. The source for the Joshua documentation is
>>currently
>> >>hosted on Github at
>> >>https://github.com/joshua-decoder/joshua-decoder.github.com
>> >>
>> >>== Initial Source ==
>> >>Current source resides at Github: github.com/joshua-decoder/joshua
>>(the
>> >>main decoder and toolkit) and github.com/joshua-decoder/thrax (the
>> >>grammar
>> >>extraction tool).
>> >>
>> >>== External Dependencies ==
>> >>Joshua has a number of external dependencies. Only BerkeleyLM (Apache
>> >>2.0)
>> >>and KenLM (LGPG 2.1) are run-time decoder dependencies (one of which
>>is
>> >>needed for translating sentences with pre-built models). The rest are
>> >>dependencies for the build system and pipeline, used for constructing
>>and
>> >>training new models from parallel text.
>> >>
>> >>Apache projects:
>> >> * Ant
>> >> * Hadoop
>> >> * Commons
>> >> * Maven
>> >> * Ivy
>> >>
>> >>There are also a number of other open-source projects with various
>> >>licenses that the project depends on both dynamically (runtime), and
>> >>statically.
>> >>
>> >>=== GNU GPL 2 ===
>> >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/
>> >>
>> >>=== LGPG 2.1 ===
>> >> * KenLM: github.com/kpu/kenlm
>> >>
>> >>=== Apache 2.0 ===
>> >> * BerkeleyLM: https://code.google.com/p/berkeleylm/
>> >>
>> >>=== GNU GPL ===
>> >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html
>> >>
>> >>== Required Resources ==
>> >> * Mailing Lists
>> >>   * priv...@joshua.incubator.apache.org
>> >>   * d...@joshua.incubator.apache.org
>> >>   * comm...@joshua.incubator.apache.org
>> >>
>> >> * Git Repos
>> >>   * https://git-wip-us.apache.org/repos/asf/joshua.git
>> >>
>> >> * Issue Tracking
>> >>   * JIRA Joshua (JOSHUA)
>> >>
>> >> * Continuous Integration
>> >>   * Jenkins builds on https://builds.apache.org/
>> >>
>> >> * Web
>> >>   * http://joshua.incubator.apache.org/
>> >>   * wiki at http://cwiki.apache.org
>> >>
>> >>== Initial Committers ==
>> >>The following is a list of the planned initial Apache committers (the
>> >>active subset of the committers for the current repository on Github).
>> >>
>> >> * Matt Post (p...@cs.jhu.edu)
>> >> * Lewis John McGibbney (lewi...@apache.org)
>> >> * Chris Mattmann (mattm...@apache.org)
>> >>
>> >>== Affiliations ==
>> >>
>> >> * Johns Hopkins University
>> >>   * Matt Post
>> >>
>> >> * NASA JPL
>> >>   * Chris Mattmann
>> >>   * Lewis John McGibbney
>> >>
>> >>
>> >>== Sponsors ==
>> >>=== Champion ===
>> >> * Chris Mattmann (NASA/JPL)
>> >>
>> >>=== Nominated Mentors ===
>> >> * Paul Ramirez
>> >> * Lewis John McGibbney
>> >> * Chris Mattmann
>> >>
>> >>== Sponsoring Entity ==
>> >>The Apache Incubator
>> >>
>> >>
>> >>
>> >>
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>Chris Mattmann, Ph.D.
>> >>Chief Architect
>> >>Instrument Software and Science Data Systems Section (398)
>> >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>Office: 168-519, Mailstop: 168-527
>> >>Email: chris.a.mattm...@nasa.gov
>> >>WWW:  http://sunset.usc.edu/~mattmann/
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>Adjunct Associate Professor, Computer Science Department
>> >>University of Southern California, Los Angeles, CA 90089 USA
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>
>> >>
>> >>
>> >
>>
>>

Reply via email to