Re: Indexing help needed

Andrzej Bialecki Fri, 25 May 2007 13:42:35 -0700

jim shirreffs wrote:

Thanks for the advice, I just don't see where in the Lucene code Ishould plug OOParcer into Lucene.
I've walked the code in LIUS and Nutch (moving on to Solr) trying tofind common objects. If I can find common objects in Lucene and NutchI'll know where to plug in.

You seem to be somewhat confused about what Lucene really is. It's justa library, and not an application. It's up to you to provide the logicand glue, or to extend any existing demo application to accomodate yourneeds. It's also a _plain_ _text_ search library. So if you want toindex anything else you need to first convert it to a plain text format.

That's essentially what OOParser does in Nutch. It extracts data from OOdocuments and converts it to plain text. Disregard other stuff in thatplugin - it has to do with how Nutch passes this data to storage (andindexing takes place in a completely different step, so you won't findit here). Just use the parts that extract plain text data - and then usethis plain text data to add fields to Lucene documents.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing help needed

Reply via email to