jim shirreffs wrote:
Thanks for the advice, I just don't see where in the Lucene code I should plug OOParcer into Lucene.

I've walked the code in LIUS and Nutch (moving on to Solr) trying to find common objects. If I can find common objects in Lucene and Nutch I'll know where to plug in.

You seem to be somewhat confused about what Lucene really is. It's just a library, and not an application. It's up to you to provide the logic and glue, or to extend any existing demo application to accomodate your needs. It's also a _plain_ _text_ search library. So if you want to index anything else you need to first convert it to a plain text format.

That's essentially what OOParser does in Nutch. It extracts data from OO documents and converts it to plain text. Disregard other stuff in that plugin - it has to do with how Nutch passes this data to storage (and indexing takes place in a completely different step, so you won't find it here). Just use the parts that extract plain text data - and then use this plain text data to add fields to Lucene documents.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to