Thanks for the advice, I just don't see where in the Lucene code I should
plug OOParcer into Lucene.
I've walked the code in LIUS and Nutch (moving on to Solr) trying to find
common objects. If I can find common objects in Lucene and Nutch I'll know
where to plug in.
Lucene Objects looks like this
IndexWriter
Analyzer
StandardAnalyzer
Document
Reader
FileReader
StringReader
DocumentWriter
But when I search thru the Nutch or LIUS code I can not find these objects.
LIUS uses reflection so I'm not going to find anything in the code, but
unforturnately the liusConfig.xml is incomplete and I can not find the class
names for the OpenOffice stuff in it.
This is all very frustrating since it should be a realatively easy to add
support for unsupported formats. The Lucene code is very nice, lius code
less so. Seems Lucene is setup to drop in new file formats I just do not
know where to drop it in or what kind of objects need to be dropped in.
Oh well guess I will code up a Reader the just spites out "Here I am" a few
hundred times and see what happens. LOL.
thank you for the reply and advice.
jim s
----- Original Message -----
From: "Andrzej Bialecki" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Friday, May 25, 2007 1:10 PM
Subject: Re: Indexing help needed
jim shirreffs wrote:
Thanks to all that try to help me out
Jim S
P.S. If I get it working I will be happy to email post the code.
If you looked at the code in Nutch, you can take most of the parse-oo
plugin verbatim, because all this plugin does is it extracts the text
content and metadata from OO files.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]