Hi, The problem:
- I have about 11K html documents to index. - I'm trying to index these documents (along with 3 more small string fields) so that when I search within the "doc" field (field with the html file content), I can get results with snippets or highlights as I get when using nutch. - While going through Wiki I noticed that if I need to do highlighting in a particular field, I have to make sure it is indexed and stored. But when I try to do the above, after indexing about 3K files which creates index of about 800MB (which is fine as files are quite lengthy) it keeps giving out of heap space errors. Things I've tried without much help: - Increase memory of tomcat - Play around with settings like autoCommit (documents and time) - Reducing mergefactor to 5 - Reducing maxBufferedDocs to 100 My question is also, if its required to store fields in index to be able to do highlighting/returning field content, how does nutch/lucene do it without that (because index for same documents created using nutch is much much smaller) But also when trying to query partially added documents, when I set field highlight on (and a particular field) it doesn't seem to have any effect. As you can see I'm very confused how to proceed. I hope I'm being clear though :-S Thanks, Ravi
