jim shirreffs wrote:
Thanks for the advice, I just don't see where in the Lucene code I
should plug OOParcer into Lucene.
I've walked the code in LIUS and Nutch (moving on to Solr) trying to
find common objects. If I can find common objects in Lucene and Nutch
I'll know where to plug in.
You seem to be somewhat confused about what Lucene really is. It's just
a library, and not an application. It's up to you to provide the logic
and glue, or to extend any existing demo application to accomodate your
needs. It's also a _plain_ _text_ search library. So if you want to
index anything else you need to first convert it to a plain text format.
That's essentially what OOParser does in Nutch. It extracts data from OO
documents and converts it to plain text. Disregard other stuff in that
plugin - it has to do with how Nutch passes this data to storage (and
indexing takes place in a completely different step, so you won't find
it here). Just use the parts that extract plain text data - and then use
this plain text data to add fields to Lucene documents.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]