Re: FacetedSearch and MultiReader

2013-01-21 Thread Shai Erera
Hi Nicola, What I had in mind is something similar to this, which is possible starting with Lucene 4.1, due to changes done to facets (per-segment faceting): DirTaxoWriter master = new DirTaxoWriter(masterDir); Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open Directories and store i

Re: FacetedSearch and MultiReader

2013-01-21 Thread Denis Bazhenov
We have similar distribute search system and we have finished with the following scheme. Search replicas (machines where index resides) are build FacetResult's based on their index chunk (top N categories with document counts). Later on the results are merged "by hands" with summing relevant ca

Re: Is LogByteSizeMergePolicy deterministic?

2013-01-21 Thread Denis Bazhenov
Can you explain in more details why is that? We have in-house replication for Lucene/3.6 index and use default IndexWriter settings. All works fine except sometimes (just after optimization, in fact) index could not be opened (segment file is missing on FS). We tolerate this issue by replicating

Re: FieldCacheTermsFilter performance

2013-01-21 Thread emmanuel Gosse
Hi, We have about 120 filters, half is selective but some filters are "boolean". It's easy to find where the difference comes. binarySearchLookup in DocTermsIndexImpl versus StringIndex : In StringIndex, just a comparaison between Strings : int cmp = lookup[mid].compareTo(key); In DocTermsIn

Re: Tool for Lucene storage recovery

2013-01-21 Thread Erick Erickson
P.S. Or just attach the code without your customized doc recovery stuff with a note about how to carry it forward? That way someone could pick it up if interested and generalize it. Best Erick On Mon, Jan 21, 2013 at 12:37 PM, Erick Erickson wrote: > Maybe do the handling as an overridable metho

Re: Tool for Lucene storage recovery

2013-01-21 Thread Erick Erickson
Maybe do the handling as an overridable method and make it abstract? That would give the skeleton of all the recovery stuff, but then require the user to implement the actual recovery? Just a thought Erick On Mon, Jan 21, 2013 at 9:06 AM, Michał Brzezicki wrote: > I don't think it is possible to

[Fwd: Re: FacetedSearch and MultiReader]

2013-01-21 Thread Nicola Buso
--- Begin Message --- Hi, it's not clear your proposal. On Mon, 2013-01-21 at 18:21 +0200, Shai Erera wrote: > Hi > > > First, if it's a one time operation, you can merge the taxonomy > indexes into one, without merging the content indexes too (but you'll > need to re-map the ordinals in each

Re: FacetedSearch and MultiReader

2013-01-21 Thread Nicola Buso
Hi Shai, I was thinking to that too, but I'm indexing all indexes in a custom distributed environment than I can't in this moment have a single categories index for all the content indexes at indexing time. A solution should be to merge all the categories indexes in one only index and use your sol

Re: FacetedSearch and MultiReader

2013-01-21 Thread Shai Erera
Hi Nicola, I think that what you're describing corresponds to distributed faceted search. I.e., you have N content indexes, alongside N taxonomy indexes. The information that's indexed in each of those sub-indexes does not correlate with the other ones. For example, say that you index the category

Re: Indexing multiple fields with one document position

2013-01-21 Thread Jack Krupansky
Send the same input text to two different analyzers for two separate fields. The first analyzer emits only the first attribute. The second analyzer emits only the second attribute. The document position in one will correspond to the document position in the other. -- Jack Krupansky -Origi

Re: FacetedSearch and MultiReader

2013-01-21 Thread Nicola Buso
Thanks for the reply Uwe, we currently can search with MultiReader over all the indexes we have. Now I want to add the faceting search, than I created a categories index for every index I currently have. To accumulate the faceted results now I have a MultiReader pointing all the indexes and I can

RE: FacetedSearch and MultiReader

2013-01-21 Thread Uwe Schindler
Just use MultiReader, it extends IndexReader, so you can pass it anywhere where IndexReader can be passed. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Nicola Buso [mailto:nb...@ebi.ac.uk] > Sent: Mon

FacetedSearch and MultiReader

2013-01-21 Thread Nicola Buso
Hi all, I'm trying to develop faceted search using lucene 4.0 faceting framework. In our project we are searching on multiple indexes using lucene MultiReader. How should we use the faceted framework to obtain FacetResults starting from a MultiReader? all the example I see are using a "single" Ind

Re: Tool for Lucene storage recovery

2013-01-21 Thread Michał Brzezicki
I don't think it is possible to simply compile it as jar since you need to implement handling of recovered documents. -- Michał 2013/1/19 Simon Willnauer > hey, > > do you wanna open a jira issue for this and attach your code? this > might help others too and if the shit hits the fan its good

Re: Inner join in lucene

2013-01-21 Thread Ramprakash Ramamoorthy
On Fri, Jan 18, 2013 at 9:05 PM, Apostolis Xekoukoulotakis < xekou...@gmail.com> wrote: > You can put those fields as a DocValue type of field. They are optimized > for use during search(or join in this case). > > Then create a collector that collects the documents which have the same > value in t

Indexing multiple fields with one document position

2013-01-21 Thread Igor Shalyminov
Hello! When indexing text with position data, one just adds field do a document in the form of its name and value, and the indexer assigns it unique position in the index. I wonder, if I have an entry with two attributes, say: cat, How do I store in the index two fields, "pos" and "number" wit