Indexing help needed

jim shirreffs Fri, 25 May 2007 10:19:47 -0700

I've been working on this for a while, I am trying to get the demo code thatcomes with Lucene to index OpenOffice documentss. I've looked at LIUS codeand at Nutch code. But can't find an easy way. So I am digging into thecode.

I wrote a KcmiDocument class that returns a Document. In it I do a doc.add()where I the specify "contents" and a FileReader

/*

* Add the contents of the file to a field named "contents". Specify aReader,


* so that the text of the file is tokenized and indexed, but not stored.

* Note that FileReader expects the file to be in the system's defaultencoding.


* If that's not the case searching for special characters will fail.

* FileReader is the key, need to add the correct reader for none textformats.


*/

doc.add(new Field("contents", new FileReader(f)));

Now if I could just add a file reader for OpenOffice say OOFileReader() thatunzip and did all the dom stuff hen everything would work and the codechanges would be minimal, right? My question is, am I correct in mythinking? And if so does any one know of an OOFileReader? If I am notcorrect what am I missing here. It is kind of important that I learn how toadd different files types like OO or AutoCad, so we can make a build (withLucene) or buy call.


Thanks to all that try to help me out

Jim S

P.S. If I get it working I will be happy to email post the code.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Indexing help needed

Reply via email to