Philippe Kernévez [pkerne...@octo.com] wrote: > We use Lucene 2.4 (provided by Alfresco).
Lucene 2.4 is 6 years old. The obvious advice is to upgrade, but I guess you have your reasons not to. > We looked at a memory dump with Eclipse Memory Analyser, and we were quite > surprised to see that most of that memory is kept by enormous String[] that > are yet mostly empty. I am guessing you have a lot of documents in your index and that you are sorting on at least one String field? http://www.lhelper.org/dev/lucene-2.4.0/docs/api/org/apache/lucene/search/Sort.html states that sorting on String in Lucene means that all Strings for that field are kept in memory. There has to be one entry in the String array(s) for each document, even if the document does not have a value for that field. If my guess is correct, the solution is to reduce the number of String sort fields, ideally to 0. Maybe you can use an integer field instead by doing some mapping? > In our case we need to have some very short word indexed, so we desactivate > 'stop words'. If we want to have the list of Term order by their index size > what is good tool to do that (Luce?) and how ca we do such request ? Luke has term statistics build-in. I don't remember the details, but I recall that it was straight forward. - Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org