On Thu, 2010-06-10 at 04:03 +0200, fujian wrote: > Another thing is about unique. I thought it was unique "field value". If it > means unique term, for English even loading all around 300,000 terms it > won't take much memory, right? (Suppose the average length of term is 10, > the total memory usage is 10*300,000=3MB)
It is only the unique field values, but remember that there is also an array of length #docs with pointers to the strings that takes up 4 or 8 bytes/pointer, depending on 32bit/64bit JVM. Furthermore, the current Lucene uses Strings which takes up a lot more than just #chars bytes: 300.000 Strings of average length 10 chars is is about 18MB. http://www.javamex.com/tutorials/memory/string_memory_usage.shtml I'm quietly hacking on a solution for this, but the current code is still at the proof of concept-stage and way too flaky to use for production: https://issues.apache.org/jira/browse/LUCENE-2369 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org