[Fwd: Re: FacetedSearch and MultiReader]

Nicola Buso Mon, 21 Jan 2013 09:14:42 -0800

--- Begin Message ---

Hi,


it's not clear your proposal.

On Mon, 2013-01-21 at 18:21 +0200, Shai Erera wrote:
> Hi
> 
> 
> First, if it's a one time operation, you can merge the taxonomy
> indexes into one, without merging the content indexes too (but you'll
> need to re-map the ordinals in each content index, by e.g. adding it
> to itself). Not a cheap solution.
I think I can't do that I have terabytes of indexes and seam not
feasible to me.

> 
> Another option is to merge all taxonomy indexes into one, and obtain
> the OrdinalMap per content index.
How this solution differ from the previous? I merge the taxonomy without
touching the content indexes?
Is there some documentation explaining how the ordinalmaps are used with
facets? Just to be aware of what I'm doing?
Suppose I merge in memory more taxonomy indexes (think I want facets on
a subset of all my indexes otherwise would be to heavy).
I iterate over the taxonomy indexes and I merge them with
TaxonomyWriter.addTaxonomy(directory, map)
how I obtain the OrdinalMap[M]? (I suppose this is the map corresponding
to my MultiReader).


> Then run the search w/ MultiReader, and when asked to count ordinal M,
> you count ordMap[M] instead.
> 
> You can do so by creating your own Aggregator, and override
> CountFacetRequest.createAggregator().

in CountFacetRequest where should I use the new OrdinalMap? should I use
the OrdinalMap.getMap to construct the CountingAggregator?
  public Aggregator createAggregator(boolean useComplements,
                                      FacetArrays arrays, IndexReader
reader,
                                      TaxonomyReader taxonomy) {
    // we rely on that, if needed, result is cleared by arrays!
    int[] a = arrays.getIntArray();
    if (useComplements) {
      return new ComplementCountingAggregator(a);
    }
    return new CountingAggregator(a);
  }
> 
> 
> If that's also not an option, then you'll need to do a form of
> distributed search. You'll need to run the search against each
> content/taxonomy index pair, then collect the top-K and merge the
> categories' weights (counts).
> Note though that in this process you may lose some categories that
> should be in the top-K.
I think I can merge in memory at indexing time;
Can you elaborate e bit more about the solution consisting in the
taxonomyindexes merge?


Nicola


> 
> E.g. imagine that categories A(3) and B(2) are returned from index1
> and A(4) and C(3) are returned from index2 (for top-2, numbers in
> parenthesis denote counts).
> 
> And say that category B appears in index2 with count 2. Then it should
> be among the top 2 categories: A(7), B(4), but instead you'll return
> A(7), C(3).
> 
> You can somewhat overcome that by requesting to count c*K, where 'c'
> is an over-counting factor (say 5), and hopefully the true top-K will
> be in the top-5*K of all indexes.
> 
> That too can break under some extreme circumstances, but we've tested
> it once and c=2 was enough for a rather large index.
> However, since your searches are run locally (i.e. you don't transmit
> intermediate results over the wire), you can use a larger 'c'.
> 
> HTH,
> Shai

--- End Message ---

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

[Fwd: Re: FacetedSearch and MultiReader]

Reply via email to