Hi Folks,

OK, the proposal period had died now and I'm now calling a formal VOTE on 
the Any23 proposal located here:

http://wiki.apache.org/incubator/Any23Proposal

Proposal text copied at the bottom of this email. I'll leave the VOTE open 
through the 
rest of the week, and close it around Saturday, October 1, early AM PDT.

Please VOTE:

[ ] +1 Accept Any23 into the Apache Incubator
[ ] +0 Don't care
[ ] -1  Don't Accept Any23 into the Apache Incubator because...

Thanks!

Cheers,
Chris

P.S. Here's my +1

Proposal Text:

= Any23 =
== Abstract ==
The following proposal is about ''Anything To Triples'' (shortly Any23) defined 
as a Java library,  a Web service and a set of command line tools to extract 
and validate structured data  in [[http://www.w3.org/RDF/|RDF]] format from a 
variety of Web documents and markup formats.  Any23 is what it is informally 
named an ''RDF Distiller''.

== Proposal ==
Any23 "Anything to Triples" is a library written in Java 6 and released under 
the Apache 2.0 License. It provides a set of extractors for scraping semantic 
markup (such as [[http://microformats.org/|Microformats]], 
[[http://www.w3.org/TR/rdfa-syntax/|RDFa]] and 
[[http://www.w3.org/TR/microdata/|Microdata]])  from several sources (HTML4, 
XHTML5, CSV), a set of data validations, a set of parsers and writers to handle 
the main RDF transport formats (RDFXML, Ntriples, NQuads, Turtle).  The library 
provides a command line tool for dealing with data extraction, conversion and 
validation, and a REST service implementation. The library is plugin based, 
allowing the hot loading of new extractors and validators. Any23 enables 
third-parties developers to access structured data from Web pages without the 
need of implementing ad-hoc scraping techniques. In this sense, Any23 will 
relieve developers from build complex solutions when developing data 
acquisition pipelines and processes targeted to semantically marked-up Web data.

== Background ==
Any23 has been initially developed at [[http://www.deri.ie/|DERI (Digital 
Enterprise Research Institute)]],  as main component of the RDF extraction 
pipeline used in [[http://sindice.com/|Sindice (the Semantic Web Index)]], now 
is evolved in joint effort with [[http://www.fbk.eu/|FBK (Fondazione Bruno 
Kessler)]]. At present time the Any23 official 
[[http://developers.any23.org|developers page]] contains all the documentation, 
while the code is maintained on [[http://code.google.com/p/any23/|Google 
Code]]. An official up-to-date showcase [[http://any23.org|demo]] is also 
available.

== Rationale ==
Provide and maintain a robust, standard and updated library for extracting and 
validating semantic markup from heterogeneous sources would provide large 
benefits to the entire Open Source Community. Researchers and academic projects 
are adopting RDF related technologies from years  while the industry is 
actually moving toward Semantic Web technologies with more concreteness. 
Several industry initiatives related to the 
[[http://en.wikipedia.org/wiki/Semantic_Web|Web of Data]]  are taking place in 
the these months. [[http://schema.org|Schema.org]], for example, is an 
initiative sponsored by  
[[http://www.google.com/about/corporate/company/|Google Inc]], 
[[http://info.yahoo.com/center/us/yahoo/|Yahoo Inc]]  and 
[[http://www.microsoft.com/about/companyinformation/en/us/default.aspx|Microsoft
 Corporation]]  to structure the data in a harmonized way on 
[[http://dev.w3.org/html5/spec/Overview.html|HTML5]] pages. 
[[http://schema.org|Schema.org]] leverages on the 
[[http://dev.w3.org/html5/md/|HTML5 Microdata]] native specification. 
[[http://ogp.me/|OpenGraphProtocol]] is the open standard sponsored by  
[[https://www.facebook.com/pages/Facebooking/114721225206500|Facebook Inc]] to 
include metadata in HTML page headers.  [[http://ogp.me/|OpenGraphProtocol]], 
initially based on [[http://www.w3.org/TR/xhtml-rdfa-primer/|RDFa]], allows to 
describe the content of a Web page and its underlying vocabulary could be 
directly represented using RDF.

= Current Status =
== Meritocracy ==
The historical Any23 team believes in meritocracy and always acted as a 
community. Mailing list, open issue tracker and other communication channels 
have always been adopted since its first release. The adoption in a larger 
community, such as Apache,  is the natural evolution for Any23. Moreover, the 
Apache standards will enforce the existing Any23 community practices and will 
be a foundation for future committers involvement.

== Core Developers ==
In alphabetical order:

 * Davide Palmisano <dpalmisano at gmail dot com>
 * Giovanni Tummarello <giovanni dot tummarello at deri dot org>
 * Michele Mostarda <michele dot mostarda at gmail dot com>
 * Richard Cyganiak <richard at cyganiak dot de>
 * Reto Bachmann-Gmuer <reto at apache dot org>
 * Simone Tripodi <simonetripodi at apache dot org>
 * Szymon Danielczyk <danielczyk.szymon at gmail dot com>
 * Tommaso Teofili <tommaso at apache dot org>

== Alignment ==
Main aim of the project is to develop and maintain a fully flavored semantic  
markup distiller that can be used by other Apache projects that need an RDF 
extraction tool. The Any23 library core is written using the following Apache 
libraries.

 * [[http://commons.apache.org/lang/|Apache Commons Lang]]
 * [[http://hc.apache.org/httpclient-3.x/|Apache Commons HTTP Client]]
 * [[http://commons.apache.org/codec/|Apache Commons Codec]]
 * [[http://tika.apache.org/|Apache Tika]]
 * [[http://commons.apache.org/cli/|Apache Commons CLI]]
 * [[http://poi.apache.org/|Apache POI]]

The Any23 service is targeted to run within any compliant Servlet  container 
like Tomcat.

= Known Risks =
== Orphaned Products ==
The increasing number of Any23 adopters and the raising interest for Semantic 
Web related technologies let us believe that there is a minimal risk for this 
work to being abandoned  from the community. Moreover Any23 has already been 
used in production by Sindice.com and  other DERI projects for years.

== Inexperience with Open Source ==
All of the committers have experience working in one or more open source 
projects inside and outside ASF.

== Homogeneous Developers ==
The list of initial committers are geographically distributed across Europe 
with no one company being associated with a majority of the developers.  Many 
of these initial developers are experienced Apache committers already  and all 
are experienced with working in distributed development communities.

== Reliance on Salaried Developers ==
To the best of our knowledge, the biggest part of the initial committers is 
being paid to develop code for this project due to the adoption of Any23 in 
their organizations infrastructures. In any case, some of the core historical 
developers (some of them no longer getting paid from the original companies 
behind Any23)  are still committing even if Any23 is not employed in their 
actual organizations. Any23 has already proven its capability to attract 
external developers.

== Relationships with Other Apache Products ==
In the last years, other projects have been under ASF incubation process 
relying on the Semantic Web technology stack, such as Apache Clerezza, Stanbol 
and Jena. This could be seen as a proof of the consolidation and the adoption 
growing tendency of such technologies. Apart the specificity of those projects, 
sharing the same underlying stack, Any23 could be employed in every projects 
needing a reliable framework to access structured semantic markup. Any23 core 
could be easily released also as a  
[[http://wiki.apache.org/nutch/PluginCentral|Apache Nutch Plugin]] and then, 
used to handy fill 
[[http://www.openrdf.org/doc/sesame2/system/ch05.html|SAIL-compliant]] triple 
stores.

== An Excessive Fascination with the Apache Brand ==
Even if the Any23 community recognizes the power and the attractiveness  of the 
ASF brand, we are absolutely aware of our already established role in the wider 
Semantic Web developers community. Any23 already proved its reliability in 
closely support all the new specifications coming  from the Microformats 
communities, our major contributors in term of  opened issues about new feature 
requests. Furthermore, we are convinced that we can enthusiastically bring 
inside the ASF new and fresh energies in order to improve our visions, insights 
and knowledge about the other  projects and, most important, to have the 
possibility of enlarge our small  community with talented and passionate 
developers.

= Documentation =
Any23 Documentation

 1. [[http://developers.any23.org/|Any23 Project Homepage]]
 1. [[http://code.google.com/p/any23/|Any23 Developer Homepage]]
 1. [[http://any23.org/|Any23 Live Demo]]

Any23 Related Specifications

 1. [[http://www.w3.org/RDF/|RDF]]
 1. [[http://www.w3.org/TR/html5/|HTML5]]
 1. [[http://www.w3.org/TR/rdfa-syntax/|RDFa]]
 1. [[http://www.w3.org/TR/microdata/|Microdata]]
 1. [[http://microformats.org/|Microformats]]
 1. [[http://www.w3.org/TR/rdf-syntax-grammar/|RDF/XML]]
 1. [[http://www.w3.org/TeamSubmission/turtle/|Turtle]]
 1. [[http://www.w3.org/TR/rdf-testcases/#ntriples|N-Triples]]
 1. [[http://sw.deri.org/2008/07/n-quads/|N-Quads]]

Any23 Other documentation

 1. 
[[http://www.slideshare.net/dpalmisano/distilling-the-web-of-data-drop-by-drop-with-java|Any23
 presentation on Slideshare]]

= Initial Source =
The intial source comprises code developed on 
[[http://code.google.com/p/any23/|GoogleCode]] licensed under the Apache 
License 2.0 (to be contributed under Grant from Giovanni Tummarello for Any23).

= Source and Intellectual Property Submission Plan =
Source code will be moved from [[http://code.google.com/p/any23/|GoogleCode]] 
space inside the SVN space of the podling.

= External Dependencies =
All the external dependencies (and their licenses) used by Any23 follows:

 * [[http://nekohtml.sourceforge.net/|Nekohtml]] (Apache 2.0)
 * [[http://www.openrdf.org|OpenRDF Sesame]] (BSD-style license)
 * [[http://jetty.codehaus.org/jetty/|Jetty]] (Apache License 2.0 and Eclipse 
Public License 1.0)
 * [[http://code.google.com/p/jspf/|Java Simple Plugin Framework]] (new BSD 
License)
 * [[http://code.google.com/p/boilerpipe/[|Boilerpipe]] (Apache License 2.0)
 * [[http://www.slf4j.org/|slf4j]] (MIT License)
 * [[http://www.junit.org/|junit]] (Common Public License - v 1.0)
 * [[http://mockito.org/|Mockito]] (MIT License)

= Cryptography =
The project does not handle cryptography in any way.

= Required Resources =
 * Mailing lists
  * any23-private (with moderated subscriptions)
  * any23-dev
  * any23-user
  * any23-commits
 * Subversion directory
  * https://svn.apache.org/repos/asf/incubator/any23
 * Website
  * Confluence (ANY23)
 * Issue Tracking
  * JIRA (ANY23)

= Initial Committers =
Names of initial committers - in alphabetical order - with current ASF status:

 * Chris Mattmann <mattmann at apache dot org> (Member)
 * Davide Palmisano <dpalmisano at gmail dot com> (ICLA signed)
 * Giovanni Tumarello <giovanni dot tummarello at deri dot org> (ICLA signed)
 * Lewis John !McGibbney <lewismc at apache dot org> (PMC Member)
 * Michele Mostarda <michele dot mostarda at gmail dot com> (ICLA signed)
 * Paul Ramirez <pramirez at apache dot org> (Member)
 * Reto Bachmann-Gmuer <reto at apache dot org> (Committer)
 * Szymon Danielczyk <danielczyk.szymon at gmail dot com> (ICLA signed)

= Sponsors =
== Champion ==
 * Chris Mattmann <mattmann at apache dot org> (Member)

== Nominated Mentors ==
 * Chris Mattmann <mattmann at apache dot org>
 * Paul Ramirez <pramirez at apache dot org>
 * Simone Tripodi <simonetripodi at apache dot org>
 * Tommaso Teofili <tommaso at apache dot org>

== Sponsoring Entity ==
 * Tika PMC

= Other interested people (in alphabetical order) =


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to