> This (very large number of unique terms) is a problem for Lucene currently. > > There are some simple improvements we could make to the terms dict > format to not require so much RAM per term in the terms index... > LUCENE-1458 (flexible indexing) has these improvements, but > unfortunately tied in w/ lots of other changes. Maybe we should break > out a separate issue for this... this'd be a great contained > improvement, if anyone out there has "the itch" :)
Resurrecting an old thread, but it's a concern that I have as well, so I thought I'd add on to this. It looks like issue 1458 was resolved on dec. 3, but I couldn't figure out what the resolution was. Does lucene 3.0 have a more memory-friendly replacement to reading the entire .tii file into RAM? If not, would just mmap'ing the .tii file and skipping around in the mmap be a better solution than essentially reading the entire file and keeping it in arrays on the heap? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org