I've been working on this for a while, I am trying to get the demo code that comes with Lucene to index OpenOffice documentss. I've looked at LIUS code and at Nutch code. But can't find an easy way. So I am digging into the code.


I wrote a KcmiDocument class that returns a Document. In it I do a doc.add() where I the specify "contents" and a FileReader

/*

* Add the contents of the file to a field named "contents". Specify a Reader,

* so that the text of the file is tokenized and indexed, but not stored.

* Note that FileReader expects the file to be in the system's default encoding.

* If that's not the case searching for special characters will fail.

* FileReader is the key, need to add the correct reader for none text formats.

*/

doc.add(new Field("contents", new FileReader(f)));





Now if I could just add a file reader for OpenOffice say OOFileReader() that unzip and did all the dom stuff hen everything would work and the code changes would be minimal, right? My question is, am I correct in my thinking? And if so does any one know of an OOFileReader? If I am not correct what am I missing here. It is kind of important that I learn how to add different files types like OO or AutoCad, so we can make a build (with Lucene) or buy call.

Thanks to all that try to help me out

Jim S

P.S. If I get it working I will be happy to email post the code.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to