Re: Tika for mediawiki ?

Mattmann, Chris A (388J) Sun, 24 Oct 2010 09:02:20 -0700

Hi Guys,

> [...] 
> Until there is a complete spec for parsing media wiki markup, or a java
> library that does a good job of extracting text from documents formatted with
> media wiki markup, I don't think extracting text from media wiki markup
> documents is in scope for Tika.


I'd disagree with that. We never have complete specs for *many* of the
existing formats we tackle in Tika, and there are exceptions and bugs and
platform-specific things that are found all the time that require
accommodations. 

I'd say if someone can find a parsing library for Media-wiki format, and
wants to throw out there a best practice on the MIME spec, or if someone was
even willing to roll their own parsing library, I'd welcome the
contribution. 

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Tika for mediawiki ?

Reply via email to