Is there away to not factor in norms data in scoring somehow? I'm just stumped as to how Luke is able to do a seach (with limit) on the docs but in my code it just dies with OutOfMemory errors. How does Luke not allocate these norms?
________________________________ From: Mark Miller <markrmil...@gmail.com> To: java-user@lucene.apache.org Sent: Tuesday, December 23, 2008 5:25:30 PM Subject: Re: Optimize and Out Of Memory Errors Mark Miller wrote: > Lebiram wrote: >> Also, what are norms > Norms are a byte value per field stored in the index that is factored into > the score. Its used for length normalization (shorter documents = more > important) and index time boosting. If you want either of those, you need > norms. When norms are loaded up into an IndexReader, its loaded into a > byte[maxdoc] array for each field - so even if one document out of 400 > million has a field, its still going to load byte[maxdoc] for that field (so > a lot of wasted RAM). Did you say you had 400 million docs and 7 fields? > Google says that would be: > > > **400 million x 7 byte = 2 670.28809 megabytes** > > On top of your other RAM usage. Just to avoid confusion, that should really read a byte per document per field. If I remember right, it gives 255 boost possibilities, limited to 25 with length normalization. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org