Hi Hayden, On Jul 21, 2012, at 6:24 AM, Mr Havercamp wrote:
> Hi Chris > > Thanks for your links, etc. I have successfully built and run Tika JAXRS and > will look to incorporate it into my component so that users can configure and > use it for Tika extraction (currently I have local Tika and SolrCell (Solr > server). I think it is important to provide users with different options > depending on their requirements (e.g. performance, simplicity, > cost-effectiveness, etc). Awesome, +1! > > Using Tika JAXRS I can very easily extract metadata which is great. I am also > able to extract content as plain text but I cannot see a setting for > returning content in xml/html. Is there a setting for this? Perhaps I'm > missing something. You are most likely correct -- the JAXRS module is an evolving spec and we, as Jason put it, would like to look to make it and the CLI and the server interface a bit more consistent and standardized. If there is something that you don't see that it does (e.g., like xml/html output), please file a feature request at: https://issues.apache.org/jira/browse/TIKA so that we can keep it in mind going forward when folks are working on this. Also, contributions welcome, so if you think you would/could take a crack at trying to add it, awesome. If not, I'm sure one of the devs working on Tika JAXRS will get around to it. Thanks! Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++