Re: FacetedSearch and MultiReader

Denis Bazhenov Mon, 21 Jan 2013 18:33:12 -0800

We have similar distribute search system and we have finished with the 
following scheme. Search replicas (machines where index resides) are build 
FacetResult's based on their index chunk (top N categories with document 
counts). Later on the results are merged "by hands" with summing relevant 
categories from different replicas.


On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi Shai,
> 
> I was thinking to that too, but I'm indexing all indexes in a custom
> distributed environment than I can't in this moment have a single
> categories index for all the content indexes at indexing time.
> A solution should be to merge all the categories indexes in one only
> index and use your solution but the merge code I see in the examples
> merge also the content index and I can't do that.
> 
> I should share the taxonomy if is possible to merge (I see the resulting
> categories indexes are not that big currently), but I would prefer to
> have a solution where I can collect the facets over multiple categories
> indexes in this way I will be sure the solution will scale better.
> 
> 
> Nicola.
> 
> 
> On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
>> Hi Nicola,
>> 
>> 
>> I think that what you're describing corresponds to distributed faceted
>> search. I.e., you have N content indexes, alongside N taxonomy
>> indexes.
>> 
>> The information that's indexed in each of those sub-indexes does not
>> correlate with the other ones.
>> For example, say that you index the category "Movie/Drama", it may
>> receive ordinal 12 in index1 and 23 in index2.
>> 
>> If you'll try to count ordinals using MultiReader, you'll just mess up
>> everything.
>> 
>> 
>> If you can share a single taxonomy index for all N content indexes,
>> then you'll be in a super-simple position:
>> 
>> 1) Open one TaxonomyReader
>> 
>> 2) Execute search with MultiReader and FacetsCollector
>> 
>> 
>> 
>> It doesn't get simpler than that ! :)
>> 
>> 
>> Before I go into great length describing what you should do if you
>> cannot share the taxonomy, let me know if that's not an option for
>> you.
>> 
>> Shai
>> 
>> 
>> 
>> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>>        Thanks for the reply Uwe,
>> 
>>        we currently can search with MultiReader over all the indexes
>>        we have.
>>        Now I want to add the faceting search, than I created a
>>        categories index
>>        for every index I currently have.
>>        To accumulate the faceted results now I have a MultiReader
>>        pointing all
>>        the indexes and I can create a TaxonomyReader for every
>>        categories index
>>        I have; all the way I see to obtain FacetResults are:
>>        1 - FacetsCollector
>>        2 - a FacetsAccumulator implementation
>> 
>>        suppose I use the second option. I should:
>>        - search as usual using the MultiReader
>>        - than try to collect all the facetresults iterating over my
>>        TaxonomyReaders; at every iteration:
>>          - I create a FacetsAccumulator using the MultiReader and a
>>        TaxonomyReader
>>          - I get a list of FacetResult from the accumulator.
>>        - as I finish I should in some way merge all the
>>        List<FacetResult> I
>>        have.
>> 
>>        I think this solution is not correct because the docsids from
>>        the search
>>        are pointing the multireader instead the taxonomyreader is
>>        pointing to
>>        the categories index of a single reader.
>>        I neither like to merge all the List of FacetResult I retrieve
>>        from the
>>        Accumulators.
>> 
>>        Probably I'm missing something, can somebody clarify to me how
>>        I should
>>        collect the facets in this case?
>> 
>> 
>>        Nicola.
>> 
>> 
>> 
>>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
>>> Just use MultiReader, it extends IndexReader, so you can
>>        pass it anywhere where IndexReader can be passed.
>>> 
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>> 
>>>> -----Original Message-----
>>>> From: Nicola Buso [mailto:nb...@ebi.ac.uk]
>>>> Sent: Monday, January 21, 2013 3:59 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: FacetedSearch and MultiReader
>>>> 
>>>> Hi all,
>>>> 
>>>> I'm trying to develop faceted search using lucene 4.0
>>        faceting framework.
>>>> In our project we are searching on multiple indexes using
>>        lucene
>>>> MultiReader. How should we use the faceted framework to
>>        obtain
>>>> FacetResults starting from a MultiReader? all the example
>>        I see are using a
>>>> "single" IndexReader.
>>>> 
>>>> 
>>>> 
>>>> Nicola.
>>>> 
>>>> 
>>>> 
>>        ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>        java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail:
>>        java-user-h...@lucene.apache.org
>>> 
>> 
>> 
>> 
>>        ---------------------------------------------------------------------
>>        To unsubscribe, e-mail:
>>        java-user-unsubscr...@lucene.apache.org
>>        For additional commands, e-mail:
>>        java-user-h...@lucene.apache.org
>> 
>> 
>> 
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

---
Denis Bazhenov <dot...@gmail.com>






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: FacetedSearch and MultiReader

Reply via email to