Hi Guys, > [...] > Until there is a complete spec for parsing media wiki markup, or a java > library that does a good job of extracting text from documents formatted with > media wiki markup, I don't think extracting text from media wiki markup > documents is in scope for Tika.
I'd disagree with that. We never have complete specs for *many* of the existing formats we tackle in Tika, and there are exceptions and bugs and platform-specific things that are found all the time that require accommodations. I'd say if someone can find a parsing library for Media-wiki format, and wants to throw out there a best practice on the MIME spec, or if someone was even willing to roll their own parsing library, I'd welcome the contribution. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
