Hi all, in Lucene 4.1, after some advise from the mailing list I am merging taxonomies (in memory because of the small size of taxonomies indexes) and collecting facets values from the merged taxonomy instead of the single ones; the scenario is: - you have a Multireader pointing to more indexes - you are querying the Multireader - you want to collect facets for the Multireader
What I'm doing: -1- taxonomies merging long createStart = System.currentTimeMillis(); catMergeDir = new RAMDirectory(); readerOrdinalsMap = new HashMap<AtomicReader, DirectoryTaxonomyWriter.OrdinalMap>(); DirectoryTaxonomyWriter taxoMergeWriter = new DirectoryTaxonomyWriter(catMergeDir); Directory taxoDirectory = null; IndexReader contentReader = null; OrdinalMap[] ordinalMapsArray = new DirectoryTaxonomyWriter.MemoryOrdinalMap[taxoIdxRepoArray.length]; for (int idx = 0; idx < taxoIdxRepoArray.length; idx++) { taxoDirectory = LuceneDirectoryFactory.getDirectory(taxoIdxRepoArray[idx]); contentReader = idxReaderArray[idx]; ordinalMapsArray[idx] = new DirectoryTaxonomyWriter.MemoryOrdinalMap(); taxoMergeWriter.addTaxonomy(taxoDirectory, ordinalMapsArray[idx]); for (AtomicReaderContext readerCtx : contentReader.leaves()) { readerOrdinalsMap.put(readerCtx.reader(), ordinalMapsArray[idx]); } } taxoMergeWriter.close(); log.info(String.format("Taxonomy merge time elapsed: %s(ms)", System.currentTimeMillis() - createStart)); ------ from the code above I'm holding: - catMergeDir: the directory containing the merged categories - readerOrdinalsMap: map containing the ordinals for every reader in the Multireader -2- aggregator based on the ordinalsMap constructed in -1- class OrdinalMappingCountingAggregator extends CountingAggregator { private int[] ordinalMap; public OrdinalMappingCountingAggregator(int[] counterArray) { super(counterArray); } @Override public void aggregate(int docID, float score, IntsRef ordinals) throws IOException { int upto = ordinals.offset + ordinals.length; for (int i = ordinals.offset; i < upto; i++) { int ordinal = ordinals.ints[i]; // original ordinal read for the AtomicReader given to setNextReader int mappedOrdinal = ordinalMap[ordinal]; // mapped ordinal, following the taxonomy merge counterArray[mappedOrdinal]++; // count the mapped ordinal instead, so all AtomicReaders count that ordinal } } @Override public boolean setNextReader(AtomicReaderContext ctx) throws IOException { if (readerOrdinalsMap.get(ctx.reader()) == null) { return false; } ordinalMap = readerOrdinalsMap.get(ctx.reader()).getMap(); return true; } } -3- override the CountFacetRequest.createAggregator(..) to return -2- return new CountFacetRequest(cp, maxCount) { @Override public Aggregator createAggregator(boolean useComplements, FacetArrays arrays, TaxonomyReader taxonomy) { int[] a = arrays.getIntArray(); return new OrdinalMappingCountingAggregator(a); } }; -------- In 4.2 is no more working, and I'm not collecting facet values from the merged taxonomy. First problem I realized is: the new api FacetCollector. create(FacetSearchParams fsp, IndexReader indexReader, TaxonomyReader taxoReader) will give back collectors and accumulators that will never call FacetRequest.createAggregator() You have to use the api FacetsCollector.create(FacetsAccumulator accumulator) passing to it a StandarFacetsAccumulator (the only one that will call FacetRequest.createAggregator(..) Second Also using the StandardFacetsAccumulator it's not working because the facet counting is wrong. Any advice why this is happening? I'm also going to check how to use this idea to mimic the behaviour of the FastCountingFacetsAggregator, that I think should be the right way. I hope I gived enough information, if somebody can help better understanding how facets changed in 4.2 will be appreciated. Nicola. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org