Re: lucene 4.2 count on merged taxonomies

Nicola Buso Thu, 11 Apr 2013 03:51:54 -0700

Hi,

I'm currently calling:
FacetsCollector.create(new StandardFacetsAccumulator(facetSearchParams,
indexReader, getTaxonomyReader())


that is calling FacetRequest.createAggregator(...)

and is not working properly; 

I'm extending the CountingAggregator and than Aggregator, if I override
FacetAccumulator.getAggregator(), etc... what is the difference from the
calls? I mean the difference from:
Aggregator.aggregate(int docID, float score, IntsRef ordinals)
and
FacetsAggregator.aggregate(FacetsCollector.MatchingDocs matchingDocs,
CategoryListParams clp, FacetArrays facetArrays)

I suppose I can use the code from FastCountingFacetsAggregator and
recalculate the ordinal based on the merged ones; than count on the
correct position in facetArrays.getIntArray().


Nicola.



On Thu, 2013-04-11 at 13:23 +0300, Shai Erera wrote:
> Hi Nicola,
> 
> I didn't read the code examples, but I'll relate to your last question
> regarding the Aggregator. Indeed, with Lucene 4.2,
> FacetRequest.createAggregator is not called by the default
> FacetsAccumulator. This method should go away from FacetRequest
> entirely, but unfortunately we did not finish all the refactoring work
> before 4.2.
> 
> 
> What you should do is extend the new FacetsAggregator and override
> FacetsAccumulator.getAggregator(). Can you try that and let us know if
> that resolves your problem?
> 
> 
> Shai
> 
> 
> 
> On Thu, Apr 11, 2013 at 1:05 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>         Hi all,
>         
>         in Lucene 4.1, after some advise from the mailing list I am
>         merging
>         taxonomies (in memory because of the small size of taxonomies
>         indexes)
>         and collecting facets values from the merged taxonomy instead
>         of the
>         single ones; the scenario is:
>         - you have a Multireader pointing to more indexes
>         - you are querying the Multireader
>         - you want to collect facets for the Multireader
>         
>         What I'm doing:
>         -1- taxonomies merging
>         long createStart = System.currentTimeMillis();
>         catMergeDir = new RAMDirectory();
>         readerOrdinalsMap = new HashMap<AtomicReader,
>         DirectoryTaxonomyWriter.OrdinalMap>();
>         DirectoryTaxonomyWriter taxoMergeWriter = new
>         DirectoryTaxonomyWriter(catMergeDir);
>         Directory taxoDirectory = null;
>         IndexReader contentReader = null;
>         OrdinalMap[] ordinalMapsArray = new
>         DirectoryTaxonomyWriter.MemoryOrdinalMap[taxoIdxRepoArray.length];
>         
>         for (int idx = 0; idx < taxoIdxRepoArray.length; idx++) {
>             taxoDirectory =
>         LuceneDirectoryFactory.getDirectory(taxoIdxRepoArray[idx]);
>             contentReader = idxReaderArray[idx];
>             ordinalMapsArray[idx] = new
>         DirectoryTaxonomyWriter.MemoryOrdinalMap();
>             taxoMergeWriter.addTaxonomy(taxoDirectory,
>         ordinalMapsArray[idx]);
>         
>             for (AtomicReaderContext readerCtx :
>         contentReader.leaves()) {
>                 readerOrdinalsMap.put(readerCtx.reader(),
>         ordinalMapsArray[idx]);
>             }
>         }
>         taxoMergeWriter.close();
>         log.info(String.format("Taxonomy merge time elapsed: %s(ms)",
>         System.currentTimeMillis() - createStart));
>         
>         ------
>         from the code above I'm holding:
>         - catMergeDir: the directory containing the merged categories
>         - readerOrdinalsMap: map containing the ordinals for every
>         reader in the
>         Multireader
>         
>         -2- aggregator based on the ordinalsMap constructed in -1-
>         class OrdinalMappingCountingAggregator extends
>         CountingAggregator {
>             private int[] ordinalMap;
>         
>             public OrdinalMappingCountingAggregator(int[]
>         counterArray) {
>                 super(counterArray);
>             }
>         
>             @Override
>             public void aggregate(int docID, float score, IntsRef
>         ordinals)
>                 throws IOException {
>         
>                 int upto = ordinals.offset + ordinals.length;
>                 for (int i = ordinals.offset; i < upto; i++) {
>                 int ordinal = ordinals.ints[i]; // original ordinal
>         read for the
>         AtomicReader given to setNextReader
>                 int mappedOrdinal = ordinalMap[ordinal]; // mapped
>         ordinal,
>         following the taxonomy merge
>                 counterArray[mappedOrdinal]++; // count the mapped
>         ordinal
>         instead, so all AtomicReaders count that ordinal
>                 }
>             }
>         
>             @Override
>             public boolean setNextReader(AtomicReaderContext ctx)
>                 throws IOException {
>         
>                 if (readerOrdinalsMap.get(ctx.reader()) == null)
>         { return
>         false; }
>                 ordinalMap =
>         readerOrdinalsMap.get(ctx.reader()).getMap();
>                 return true;
>             }
>         }
>         
>         -3- override the CountFacetRequest.createAggregator(..) to
>         return -2-
>         return new CountFacetRequest(cp, maxCount) {
>         
>             @Override
>             public Aggregator createAggregator(boolean useComplements,
>                 FacetArrays arrays, TaxonomyReader taxonomy) {
>         
>                 int[] a = arrays.getIntArray();
>         
>                 return new OrdinalMappingCountingAggregator(a);
>             }
>         };
>         --------
>         In 4.2 is no more working, and I'm not collecting facet values
>         from the
>         merged taxonomy.
>         
>         First problem I realized is:
>         the new api FacetCollector.     create(FacetSearchParams fsp,
>         IndexReader
>         indexReader, TaxonomyReader taxoReader) will give back
>         collectors and
>         accumulators that will never call
>         FacetRequest.createAggregator()
>         You have to use the api
>          FacetsCollector.create(FacetsAccumulator
>         accumulator) passing to it a StandarFacetsAccumulator (the
>         only one that
>         will call FacetRequest.createAggregator(..)
>         
>         Second
>         Also using the StandardFacetsAccumulator it's not working
>         because the
>         facet counting is wrong.
>         Any advice why this is happening?
>         
>         I'm also going to check how to use this idea to mimic the
>         behaviour of
>         the FastCountingFacetsAggregator, that I think should be the
>         right way.
>         
>         I hope I gived enough information, if somebody can help better
>         understanding how facets changed in 4.2 will be appreciated.
>         
>         
>         
>         Nicola.
>         
>         
>         
>         
>         
>         ---------------------------------------------------------------------
>         To unsubscribe, e-mail:
>         java-user-unsubscr...@lucene.apache.org
>         For additional commands, e-mail:
>         java-user-h...@lucene.apache.org
>         
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: lucene 4.2 count on merged taxonomies

Reply via email to