Re: GROUP BY in Lucene

Rob Audenaerde Sat, 19 Mar 2016 13:04:04 -0700

Hi Gimantha,

You don't need to store the aggregates and don't need to retrieve
Documents. The aggregates are calculated during collection using the
BinaryDocValues from the facet-module. What I do, is that I need to store
values in the facets using AssociationFacetFields. (for example
FloatAssociationFacetField). I just choose facets because then I can use
the facets as well :)


I have a implementation of `Facets` class that does all the aggregation. I
cannot paste all the code unfortunately, but here is the idea (it is loosly
based on the TaxonomyFacetSumIntAssociations implementation, where you can
look up how the BinaryDocValues are translated to ordinals and to facets).
This aggregation is used in conjunction with a FacetsCollector, which
collects the facets during a search:

        FacetsCollector fc = new FacetsCollector();
        searcher.search(new ConstantScoreQuery(query), fc);


Then, the use this FacetsCollector:

     taxoReader = getTaxonomyReaderManager().acquire();
     OnePassTaxonomyFacets facets = new OnePassTaxonomyFacets(taxoReader,
LuceneIndexConfig.facetConfig);
     Collection<GroupByResultTuple>
facets.aggregateValues(fc.getMatchingDocs(), p.getGroupByListWithoutData(),
aggregateFields);


The aggregateValues method (cannot paste it all :(  ) :


    public final Collection<GroupByResultTuple>
aggregateValues(List<MatchingDocs> matchingDocs, final List<GroupByField>
groupByFields,
            final List<String> aggregateFieldNames, EmptyValues
emptyValues) throws IOException {
        LOG.info("Starting aggregation for pivot.. EmptyValues=" +
emptyValues);

        //We want to group a list of ordinals to a list of aggregates. The
taxoReader has the ordinals, so a selection like 'Lang=NL, Region=South'
will
        //end up like a MultiIntKey of [13,44]
        Map<MultiIntKey, List<TotalFacetAgg>> aggs = Maps.newHashMap();

        List<String> groupByFieldsNames = Lists.newArrayList();
        for (GroupByField gbf : groupByFields) {
            groupByFieldsNames.add(gbf.getField().getName());
        }
        int groupByCount = groupByFieldsNames.size();

        //We need to know which ordinals are the 'group-by' ordinals, so we
can check if a ordinal that is found, belongs to one of these fields
        int[] groupByOrdinals = new int[groupByCount];
        for (int i = 0; i < groupByOrdinals.length; i++) {
            groupByOrdinals[i] =
this.getOrdinalForListItem(groupByFieldsNames, i);
        }

        //We need to know with ordinals are the 'aggregate-field' ordinals,
so we can check if a ordinal that is found, belongs to one of these fields
        int[] aggregateOrdinals = new int[aggregateFieldNames.size()];
        for (int i = 0; i < aggregateOrdinals.length; i++) {
            aggregateOrdinals[i] =
this.getOrdinalForListItem(aggregateFieldNames, i);
        }

        //Now we go and find all the ordinals in the matching documents.
        //For each ordinal, we check if it is a groupBy-ordinal, or a
aggregate-ordinal, and act accordinly.
        for (MatchingDocs hitList : matchingDocs) {
            BinaryDocValues dv =
hitList.context.reader().getBinaryDocValues(this.indexFieldName);

            //Here find the oridinals of the group-by-fields and the
arrgegate fields.
            //Create a multi ordinal key MultiIntKey from the
group-by-ordinals and use that to add the current value of the fiels to do
the agggregation to the facet-aggregates

            ......


Hope this helps :)
-Rob

Re: GROUP BY in Lucene

Reply via email to