lucene 4.2 count on merged taxonomies

Nicola Buso Thu, 11 Apr 2013 03:06:16 -0700

Hi all,

in Lucene 4.1, after some advise from the mailing list I am merging
taxonomies (in memory because of the small size of taxonomies indexes)
and collecting facets values from the merged taxonomy instead of the
single ones; the scenario is:
- you have a Multireader pointing to more indexes
- you are querying the Multireader
- you want to collect facets for the Multireader


What I'm doing:
-1- taxonomies merging
long createStart = System.currentTimeMillis();
catMergeDir = new RAMDirectory();
readerOrdinalsMap = new HashMap<AtomicReader,
DirectoryTaxonomyWriter.OrdinalMap>();
DirectoryTaxonomyWriter taxoMergeWriter = new
DirectoryTaxonomyWriter(catMergeDir);
Directory taxoDirectory = null;
IndexReader contentReader = null;
OrdinalMap[] ordinalMapsArray = new
DirectoryTaxonomyWriter.MemoryOrdinalMap[taxoIdxRepoArray.length];

for (int idx = 0; idx < taxoIdxRepoArray.length; idx++) {
    taxoDirectory =
LuceneDirectoryFactory.getDirectory(taxoIdxRepoArray[idx]);
    contentReader = idxReaderArray[idx];
    ordinalMapsArray[idx] = new
DirectoryTaxonomyWriter.MemoryOrdinalMap();
    taxoMergeWriter.addTaxonomy(taxoDirectory, ordinalMapsArray[idx]);

    for (AtomicReaderContext readerCtx : contentReader.leaves()) {
        readerOrdinalsMap.put(readerCtx.reader(),
ordinalMapsArray[idx]);
    }
}
taxoMergeWriter.close();
log.info(String.format("Taxonomy merge time elapsed: %s(ms)",
System.currentTimeMillis() - createStart));

------
from the code above I'm holding:
- catMergeDir: the directory containing the merged categories
- readerOrdinalsMap: map containing the ordinals for every reader in the
Multireader

-2- aggregator based on the ordinalsMap constructed in -1-
class OrdinalMappingCountingAggregator extends CountingAggregator {
    private int[] ordinalMap;

    public OrdinalMappingCountingAggregator(int[] counterArray) {
        super(counterArray);
    }

    @Override
    public void aggregate(int docID, float score, IntsRef ordinals)
        throws IOException {

        int upto = ordinals.offset + ordinals.length;
        for (int i = ordinals.offset; i < upto; i++) {
        int ordinal = ordinals.ints[i]; // original ordinal read for the
AtomicReader given to setNextReader
        int mappedOrdinal = ordinalMap[ordinal]; // mapped ordinal,
following the taxonomy merge
        counterArray[mappedOrdinal]++; // count the mapped ordinal
instead, so all AtomicReaders count that ordinal
        }
    }

    @Override
    public boolean setNextReader(AtomicReaderContext ctx)
        throws IOException {

        if (readerOrdinalsMap.get(ctx.reader()) == null) { return
false; }
        ordinalMap = readerOrdinalsMap.get(ctx.reader()).getMap();
        return true;
    }
}

-3- override the CountFacetRequest.createAggregator(..) to return -2-
return new CountFacetRequest(cp, maxCount) {

    @Override
    public Aggregator createAggregator(boolean useComplements,
        FacetArrays arrays, TaxonomyReader taxonomy) {

        int[] a = arrays.getIntArray();

        return new OrdinalMappingCountingAggregator(a);
    }
};
--------
In 4.2 is no more working, and I'm not collecting facet values from the
merged taxonomy.

First problem I realized is:
the new api FacetCollector.     create(FacetSearchParams fsp, IndexReader
indexReader, TaxonomyReader taxoReader) will give back collectors and
accumulators that will never call FacetRequest.createAggregator()
You have to use the api  FacetsCollector.create(FacetsAccumulator
accumulator) passing to it a StandarFacetsAccumulator (the only one that
will call FacetRequest.createAggregator(..)

Second
Also using the StandardFacetsAccumulator it's not working because the
facet counting is wrong.
Any advice why this is happening?

I'm also going to check how to use this idea to mimic the behaviour of
the FastCountingFacetsAggregator, that I think should be the right way.

I hope I gived enough information, if somebody can help better
understanding how facets changed in 4.2 will be appreciated.



Nicola.





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

lucene 4.2 count on merged taxonomies

Reply via email to