+1 2011/9/27 Mattmann, Chris A (388J) <chris.a.mattm...@jpl.nasa.gov>: > Hi Folks, > > OK, the proposal period had died now and I'm now calling a formal VOTE on > the Any23 proposal located here: > > http://wiki.apache.org/incubator/Any23Proposal > > Proposal text copied at the bottom of this email. I'll leave the VOTE open > through the > rest of the week, and close it around Saturday, October 1, early AM PDT. > > Please VOTE: > > [ ] +1 Accept Any23 into the Apache Incubator > [ ] +0 Don't care > [ ] -1 Don't Accept Any23 into the Apache Incubator because... > > Thanks! > > Cheers, > Chris > > P.S. Here's my +1 > > Proposal Text: > > = Any23 = > == Abstract == > The following proposal is about ''Anything To Triples'' (shortly Any23) > defined as a Java library, a Web service and a set of command line tools to > extract and validate structured data in [[http://www.w3.org/RDF/|RDF]] > format from a variety of Web documents and markup formats. Any23 is what it > is informally named an ''RDF Distiller''. > > == Proposal == > Any23 "Anything to Triples" is a library written in Java 6 and released under > the Apache 2.0 License. It provides a set of extractors for scraping semantic > markup (such as [[http://microformats.org/|Microformats]], > [[http://www.w3.org/TR/rdfa-syntax/|RDFa]] and > [[http://www.w3.org/TR/microdata/|Microdata]]) from several sources (HTML4, > XHTML5, CSV), a set of data validations, a set of parsers and writers to > handle the main RDF transport formats (RDFXML, Ntriples, NQuads, Turtle). > The library provides a command line tool for dealing with data extraction, > conversion and validation, and a REST service implementation. The library is > plugin based, allowing the hot loading of new extractors and validators. > Any23 enables third-parties developers to access structured data from Web > pages without the need of implementing ad-hoc scraping techniques. In this > sense, Any23 will relieve developers from build complex solutions when > developing data acquisition pipelines and processes targeted to semantically > marked-up Web data. > > == Background == > Any23 has been initially developed at [[http://www.deri.ie/|DERI (Digital > Enterprise Research Institute)]], as main component of the RDF extraction > pipeline used in [[http://sindice.com/|Sindice (the Semantic Web Index)]], > now is evolved in joint effort with [[http://www.fbk.eu/|FBK (Fondazione > Bruno Kessler)]]. At present time the Any23 official > [[http://developers.any23.org|developers page]] contains all the > documentation, while the code is maintained on > [[http://code.google.com/p/any23/|Google Code]]. An official up-to-date > showcase [[http://any23.org|demo]] is also available. > > == Rationale == > Provide and maintain a robust, standard and updated library for extracting > and validating semantic markup from heterogeneous sources would provide large > benefits to the entire Open Source Community. Researchers and academic > projects are adopting RDF related technologies from years while the industry > is actually moving toward Semantic Web technologies with more concreteness. > Several industry initiatives related to the > [[http://en.wikipedia.org/wiki/Semantic_Web|Web of Data]] are taking place > in the these months. [[http://schema.org|Schema.org]], for example, is an > initiative sponsored by > [[http://www.google.com/about/corporate/company/|Google Inc]], > [[http://info.yahoo.com/center/us/yahoo/|Yahoo Inc]] and > [[http://www.microsoft.com/about/companyinformation/en/us/default.aspx|Microsoft > Corporation]] to structure the data in a harmonized way on > [[http://dev.w3.org/html5/spec/Overview.html|HTML5]] pages. > [[http://schema.org|Schema.org]] leverages on the > [[http://dev.w3.org/html5/md/|HTML5 Microdata]] native specification. > [[http://ogp.me/|OpenGraphProtocol]] is the open standard sponsored by > [[https://www.facebook.com/pages/Facebooking/114721225206500|Facebook Inc]] > to include metadata in HTML page headers. > [[http://ogp.me/|OpenGraphProtocol]], initially based on > [[http://www.w3.org/TR/xhtml-rdfa-primer/|RDFa]], allows to describe the > content of a Web page and its underlying vocabulary could be directly > represented using RDF. > > = Current Status = > == Meritocracy == > The historical Any23 team believes in meritocracy and always acted as a > community. Mailing list, open issue tracker and other communication channels > have always been adopted since its first release. The adoption in a larger > community, such as Apache, is the natural evolution for Any23. Moreover, the > Apache standards will enforce the existing Any23 community practices and will > be a foundation for future committers involvement. > > == Core Developers == > In alphabetical order: > > * Davide Palmisano <dpalmisano at gmail dot com> > * Giovanni Tummarello <giovanni dot tummarello at deri dot org> > * Michele Mostarda <michele dot mostarda at gmail dot com> > * Richard Cyganiak <richard at cyganiak dot de> > * Reto Bachmann-Gmuer <reto at apache dot org> > * Simone Tripodi <simonetripodi at apache dot org> > * Szymon Danielczyk <danielczyk.szymon at gmail dot com> > * Tommaso Teofili <tommaso at apache dot org> > > == Alignment == > Main aim of the project is to develop and maintain a fully flavored semantic > markup distiller that can be used by other Apache projects that need an RDF > extraction tool. The Any23 library core is written using the following Apache > libraries. > > * [[http://commons.apache.org/lang/|Apache Commons Lang]] > * [[http://hc.apache.org/httpclient-3.x/|Apache Commons HTTP Client]] > * [[http://commons.apache.org/codec/|Apache Commons Codec]] > * [[http://tika.apache.org/|Apache Tika]] > * [[http://commons.apache.org/cli/|Apache Commons CLI]] > * [[http://poi.apache.org/|Apache POI]] > > The Any23 service is targeted to run within any compliant Servlet container > like Tomcat. > > = Known Risks = > == Orphaned Products == > The increasing number of Any23 adopters and the raising interest for Semantic > Web related technologies let us believe that there is a minimal risk for this > work to being abandoned from the community. Moreover Any23 has already been > used in production by Sindice.com and other DERI projects for years. > > == Inexperience with Open Source == > All of the committers have experience working in one or more open source > projects inside and outside ASF. > > == Homogeneous Developers == > The list of initial committers are geographically distributed across Europe > with no one company being associated with a majority of the developers. Many > of these initial developers are experienced Apache committers already and > all are experienced with working in distributed development communities. > > == Reliance on Salaried Developers == > To the best of our knowledge, the biggest part of the initial committers is > being paid to develop code for this project due to the adoption of Any23 in > their organizations infrastructures. In any case, some of the core historical > developers (some of them no longer getting paid from the original companies > behind Any23) are still committing even if Any23 is not employed in their > actual organizations. Any23 has already proven its capability to attract > external developers. > > == Relationships with Other Apache Products == > In the last years, other projects have been under ASF incubation process > relying on the Semantic Web technology stack, such as Apache Clerezza, > Stanbol and Jena. This could be seen as a proof of the consolidation and the > adoption growing tendency of such technologies. Apart the specificity of > those projects, sharing the same underlying stack, Any23 could be employed in > every projects needing a reliable framework to access structured semantic > markup. Any23 core could be easily released also as a > [[http://wiki.apache.org/nutch/PluginCentral|Apache Nutch Plugin]] and then, > used to handy fill > [[http://www.openrdf.org/doc/sesame2/system/ch05.html|SAIL-compliant]] triple > stores. > > == An Excessive Fascination with the Apache Brand == > Even if the Any23 community recognizes the power and the attractiveness of > the ASF brand, we are absolutely aware of our already established role in the > wider Semantic Web developers community. Any23 already proved its reliability > in closely support all the new specifications coming from the Microformats > communities, our major contributors in term of opened issues about new > feature requests. Furthermore, we are convinced that we can enthusiastically > bring inside the ASF new and fresh energies in order to improve our visions, > insights and knowledge about the other projects and, most important, to have > the possibility of enlarge our small community with talented and passionate > developers. > > = Documentation = > Any23 Documentation > > 1. [[http://developers.any23.org/|Any23 Project Homepage]] > 1. [[http://code.google.com/p/any23/|Any23 Developer Homepage]] > 1. [[http://any23.org/|Any23 Live Demo]] > > Any23 Related Specifications > > 1. [[http://www.w3.org/RDF/|RDF]] > 1. [[http://www.w3.org/TR/html5/|HTML5]] > 1. [[http://www.w3.org/TR/rdfa-syntax/|RDFa]] > 1. [[http://www.w3.org/TR/microdata/|Microdata]] > 1. [[http://microformats.org/|Microformats]] > 1. [[http://www.w3.org/TR/rdf-syntax-grammar/|RDF/XML]] > 1. [[http://www.w3.org/TeamSubmission/turtle/|Turtle]] > 1. [[http://www.w3.org/TR/rdf-testcases/#ntriples|N-Triples]] > 1. [[http://sw.deri.org/2008/07/n-quads/|N-Quads]] > > Any23 Other documentation > > 1. > [[http://www.slideshare.net/dpalmisano/distilling-the-web-of-data-drop-by-drop-with-java|Any23 > presentation on Slideshare]] > > = Initial Source = > The intial source comprises code developed on > [[http://code.google.com/p/any23/|GoogleCode]] licensed under the Apache > License 2.0 (to be contributed under Grant from Giovanni Tummarello for > Any23). > > = Source and Intellectual Property Submission Plan = > Source code will be moved from [[http://code.google.com/p/any23/|GoogleCode]] > space inside the SVN space of the podling. > > = External Dependencies = > All the external dependencies (and their licenses) used by Any23 follows: > > * [[http://nekohtml.sourceforge.net/|Nekohtml]] (Apache 2.0) > * [[http://www.openrdf.org|OpenRDF Sesame]] (BSD-style license) > * [[http://jetty.codehaus.org/jetty/|Jetty]] (Apache License 2.0 and Eclipse > Public License 1.0) > * [[http://code.google.com/p/jspf/|Java Simple Plugin Framework]] (new BSD > License) > * [[http://code.google.com/p/boilerpipe/[|Boilerpipe]] (Apache License 2.0) > * [[http://www.slf4j.org/|slf4j]] (MIT License) > * [[http://www.junit.org/|junit]] (Common Public License - v 1.0) > * [[http://mockito.org/|Mockito]] (MIT License) > > = Cryptography = > The project does not handle cryptography in any way. > > = Required Resources = > * Mailing lists > * any23-private (with moderated subscriptions) > * any23-dev > * any23-user > * any23-commits > * Subversion directory > * https://svn.apache.org/repos/asf/incubator/any23 > * Website > * Confluence (ANY23) > * Issue Tracking > * JIRA (ANY23) > > = Initial Committers = > Names of initial committers - in alphabetical order - with current ASF status: > > * Chris Mattmann <mattmann at apache dot org> (Member) > * Davide Palmisano <dpalmisano at gmail dot com> (ICLA signed) > * Giovanni Tumarello <giovanni dot tummarello at deri dot org> (ICLA signed) > * Lewis John !McGibbney <lewismc at apache dot org> (PMC Member) > * Michele Mostarda <michele dot mostarda at gmail dot com> (ICLA signed) > * Paul Ramirez <pramirez at apache dot org> (Member) > * Reto Bachmann-Gmuer <reto at apache dot org> (Committer) > * Szymon Danielczyk <danielczyk.szymon at gmail dot com> (ICLA signed) > > = Sponsors = > == Champion == > * Chris Mattmann <mattmann at apache dot org> (Member) > > == Nominated Mentors == > * Chris Mattmann <mattmann at apache dot org> > * Paul Ramirez <pramirez at apache dot org> > * Simone Tripodi <simonetripodi at apache dot org> > * Tommaso Teofili <tommaso at apache dot org> > > == Sponsoring Entity == > * Tika PMC > > = Other interested people (in alphabetical order) = > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
-- Olivier Lamy Talend : http://talend.com http://twitter.com/olamy | http://linkedin.com/in/olamy --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org