I've been working on this for a while, I am trying to get the demo code that
comes with Lucene to index OpenOffice documentss. I've looked at LIUS code
and at Nutch code. But can't find an easy way. So I am digging into the
code.
I wrote a KcmiDocument class that returns a Document. In it I do a doc.add()
where I the specify "contents" and a FileReader
/*
* Add the contents of the file to a field named "contents". Specify a
Reader,
* so that the text of the file is tokenized and indexed, but not stored.
* Note that FileReader expects the file to be in the system's default
encoding.
* If that's not the case searching for special characters will fail.
* FileReader is the key, need to add the correct reader for none text
formats.
*/
doc.add(new Field("contents", new FileReader(f)));
Now if I could just add a file reader for OpenOffice say OOFileReader() that
unzip and did all the dom stuff hen everything would work and the code
changes would be minimal, right? My question is, am I correct in my
thinking? And if so does any one know of an OOFileReader? If I am not
correct what am I missing here. It is kind of important that I learn how to
add different files types like OO or AutoCad, so we can make a build (with
Lucene) or buy call.
Thanks to all that try to help me out
Jim S
P.S. If I get it working I will be happy to email post the code.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]