I will try it. I see there is already a lucene-4.1.0 package (dated 2013/01/21) available for download, do you know if this version will be released soon?
Nicola. On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote: > Hi Nicola, > > What I had in mind is something similar to this, which is possible starting > with Lucene 4.1, due to changes done to facets (per-segment faceting): > > DirTaxoWriter master = new DirTaxoWriter(masterDir); > Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open Directories > and store in that array > OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize > OrdinalMap and store in that array > > // now do the merge > for (int i = 0; i < origTaxoDirs.length; i++) { > master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]); > } > > // now open your readers, and create the important map > Map<AtomicReader,OrdinalMap) readerOrdinals = new > HashMap<AtomicReader,OrdinalMap>(); > DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length]; > for (int i = 0; i < origTaxoDirs.length; i++) { > DirectoryReader r = DirectoryReader.open(contentDirectories[i]); > OrdinalMap ordMap = ordinalMaps[i]; > for (AtomicReaderContext ctx : r.leaves()) { > readerOrdinals.put(ctx.reader(), ordMap); > } > } > > MultiReader mr = new MultiReader(readers); > > // create your FacetRequest (CountFacetRequest) with a custom Aggregator > FacetRequest fr = new CountFacetRequest(cp, topK) { > @Override > public Aggregator createAggregator(...) { > return new OrdinalMappingAggregator() { > int[] ordMap; > > @Override > public void setNextReader(AtomicReaderContext context) { > ordMap = readerOrdinals.get(context.reader()).getMap(); > } > > @Override > public void aggregate(int docID, float score, IntsRef ordinals) { > int upto = ordinals.offset + ordinals.length; > for (int i = ordinals.offset; i < upto; i++) { > int ordinal = ordinals[i]; // original ordinal read for the > AtomicReader given to setNextReader > int mappedOrdinal = ordMap[ordinal]; // mapped ordinal, following > the taxonomy merge > counts[mappedOrdinal]++; // count the mapped ordinal instead, so > all AtomicReaders count that ordinal > } > } > }; > } > } > > While it may look like I wrote actual code to do it, I didn't :). So I > guess it should work, but I haven't tried it. > That way, you don't touch the content indexes at all, just the taxonomy > ones. > > Note however that you'll need to do this step every time the taxonomy index > is updated, and you refresh the TaxoReader instance. > Also, this will only work if all your indexes are opened in the same JVM > (which I assume that's the case, since you use MultiReader). > > If you still don't want to do that, then what Dennis wrote above is another > way to do distributed faceted search, either inside the same JVM or across > multiple JVMs. > You obtain the FacetResult from each search and merge the results > (unfortunately, there's still no tool in Lucene to do that for you). > Just make sure to ask for a larger K, to ensure that the correct top-K is > returned (see my previous notes). > > Shai > > > > > On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <dot...@gmail.com> wrote: > > > We have similar distribute search system and we have finished with the > > following scheme. Search replicas (machines where index resides) are build > > FacetResult's based on their index chunk (top N categories with document > > counts). Later on the results are merged "by hands" with summing relevant > > categories from different replicas. > > > > On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote: > > > > > Hi Shai, > > > > > > I was thinking to that too, but I'm indexing all indexes in a custom > > > distributed environment than I can't in this moment have a single > > > categories index for all the content indexes at indexing time. > > > A solution should be to merge all the categories indexes in one only > > > index and use your solution but the merge code I see in the examples > > > merge also the content index and I can't do that. > > > > > > I should share the taxonomy if is possible to merge (I see the resulting > > > categories indexes are not that big currently), but I would prefer to > > > have a solution where I can collect the facets over multiple categories > > > indexes in this way I will be sure the solution will scale better. > > > > > > > > > Nicola. > > > > > > > > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote: > > >> Hi Nicola, > > >> > > >> > > >> I think that what you're describing corresponds to distributed faceted > > >> search. I.e., you have N content indexes, alongside N taxonomy > > >> indexes. > > >> > > >> The information that's indexed in each of those sub-indexes does not > > >> correlate with the other ones. > > >> For example, say that you index the category "Movie/Drama", it may > > >> receive ordinal 12 in index1 and 23 in index2. > > >> > > >> If you'll try to count ordinals using MultiReader, you'll just mess up > > >> everything. > > >> > > >> > > >> If you can share a single taxonomy index for all N content indexes, > > >> then you'll be in a super-simple position: > > >> > > >> 1) Open one TaxonomyReader > > >> > > >> 2) Execute search with MultiReader and FacetsCollector > > >> > > >> > > >> > > >> It doesn't get simpler than that ! :) > > >> > > >> > > >> Before I go into great length describing what you should do if you > > >> cannot share the taxonomy, let me know if that's not an option for > > >> you. > > >> > > >> Shai > > >> > > >> > > >> > > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote: > > >> Thanks for the reply Uwe, > > >> > > >> we currently can search with MultiReader over all the indexes > > >> we have. > > >> Now I want to add the faceting search, than I created a > > >> categories index > > >> for every index I currently have. > > >> To accumulate the faceted results now I have a MultiReader > > >> pointing all > > >> the indexes and I can create a TaxonomyReader for every > > >> categories index > > >> I have; all the way I see to obtain FacetResults are: > > >> 1 - FacetsCollector > > >> 2 - a FacetsAccumulator implementation > > >> > > >> suppose I use the second option. I should: > > >> - search as usual using the MultiReader > > >> - than try to collect all the facetresults iterating over my > > >> TaxonomyReaders; at every iteration: > > >> - I create a FacetsAccumulator using the MultiReader and a > > >> TaxonomyReader > > >> - I get a list of FacetResult from the accumulator. > > >> - as I finish I should in some way merge all the > > >> List<FacetResult> I > > >> have. > > >> > > >> I think this solution is not correct because the docsids from > > >> the search > > >> are pointing the multireader instead the taxonomyreader is > > >> pointing to > > >> the categories index of a single reader. > > >> I neither like to merge all the List of FacetResult I retrieve > > >> from the > > >> Accumulators. > > >> > > >> Probably I'm missing something, can somebody clarify to me how > > >> I should > > >> collect the facets in this case? > > >> > > >> > > >> Nicola. > > >> > > >> > > >> > > >> On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote: > > >>> Just use MultiReader, it extends IndexReader, so you can > > >> pass it anywhere where IndexReader can be passed. > > >>> > > >>> ----- > > >>> Uwe Schindler > > >>> H.-H.-Meier-Allee 63, D-28213 Bremen > > >>> http://www.thetaphi.de > > >>> eMail: u...@thetaphi.de > > >>> > > >>>> -----Original Message----- > > >>>> From: Nicola Buso [mailto:nb...@ebi.ac.uk] > > >>>> Sent: Monday, January 21, 2013 3:59 PM > > >>>> To: java-user@lucene.apache.org > > >>>> Subject: FacetedSearch and MultiReader > > >>>> > > >>>> Hi all, > > >>>> > > >>>> I'm trying to develop faceted search using lucene 4.0 > > >> faceting framework. > > >>>> In our project we are searching on multiple indexes using > > >> lucene > > >>>> MultiReader. How should we use the faceted framework to > > >> obtain > > >>>> FacetResults starting from a MultiReader? all the example > > >> I see are using a > > >>>> "single" IndexReader. > > >>>> > > >>>> > > >>>> > > >>>> Nicola. > > >>>> > > >>>> > > >>>> > > >> > > --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: > > >> java-user-unsubscr...@lucene.apache.org > > >>>> For additional commands, e-mail: > > >> java-user-h...@lucene.apache.org > > >>> > > >> > > >> > > >> > > >> > > --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: > > >> java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: > > >> java-user-h...@lucene.apache.org > > >> > > >> > > >> > > >> > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > --- > > Denis Bazhenov <dot...@gmail.com> > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org