Re: MemoryUsage of sorting

Chris Hostetter Wed, 28 Jun 2006 21:40:43 -0700

: some OutOfMemory errors. If I understand it correctly, each unique term
: in a field is read into a cache, when I use Searcher.search(Query query,
: Sort sort) with one SortField. So even if my query only finds 5


Minor clarification: if the sort type is one of the numeric types, then an
array of that type is created of the same size as the number of docs in
your index -- regardless of how many unique terms there are.  if the sort
type is String, then a String[] of all the unique Term values is created,
*and* and array of ints (one per document) is created to use as an index
into that String[]

: documents, Lucene would start to build a cache of maybe a few millionen
: unique field entries, which would then be re-used for a further queries.

correct - the assumption is that if you are sorting on field "foo" in this
query, there will probably be another query you want to sort on field
"foo" in the near future.

: Is this correct? It would probably be best practice, to do sorting
: yourself, if many unique terms are concerned for only a few search
: results, especially since we have lots of index updates, which makes
: reusing IndexSearcher a little lot harder.

it might make sense to do that ... but it you would have to use "STORED"
fields to do that -- typically searching is done on INDEXED fields,
because it's very easy/fast to build up the FieldCache for every document
by walking the TermEnum of the indexed fields; but if you don't want to
build the FieldCache for every document you'll need some way to get a
value for each of hte matching documents -- STORED fields seem like the
most logical.

You could probably write a pretty generic SortComparatorSource that would
do this.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: MemoryUsage of sorting

Reply via email to