Re: Group by in Lucene ?

Mark Miller Wed, 28 Jan 2009 08:03:24 -0800

Group-by in Lucene/Solr has not been solved in a great general way yetto my knowledge.

Ideally, we would want a solution that does not need to fit into memory.However, you need the value of the field for each document. to do thegrouping As you are finding, this is not cheap to get. Currently, theefficient way to get it is to use a FieldCache. This, however, requiresthat every distinct value can fit into memory.

Once you have efficient access to the values, you need to be able toefficiently group the results, again not bounded by memory (which wealready are with the FieldCache).

There are quite a few ways to do this. The simplest is to group untilyou have used all the memory you want, then for everything left,anything that doesnt match a group, write it to a file, if it does,increment the group count. Use the overflow file as the input in thenext run, repeat until there is no overflow. You can improve on that bypartitioning the overflow file.


And then there are a dozen other methods.

Solr has a patch in JIRA that uses a sorting method. First the resultsare sorted on the group-by field, then scanned through for grouping -all field values that are the same will be next to each other. Finally,if you really wanted to sort on a different field, another sort isapplied. Thats not ideal IMO, but its a start.


- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Group by in Lucene ?

Reply via email to