Re: IndexReader close listeners and NRT

2013-11-07 Thread Ravikumar Govindarajan
> So, in your code, "reader" is the top-level reader, not the one > segment you are pulling a scorer on (context.reader()). > > So you are building your cache on the top-level reader, not the > segment's reader? Is that intentional? (It's not NRT friendly). Not really. It is an IndexSearcher(Ato

Re: Question about SearcherManager.maybeReopen() method.

2013-11-07 Thread Michael McCandless
The picture didn't come through to the list. If you are really fully re-indexing and replacing the index every time then you should just open a new IndexReader instead of trying to .maybeReopen? Ie, the newly opened reader cannot share any segments with the old one, so you get no benefit from it.

Re: IndexReader close listeners and NRT

2013-11-07 Thread Michael McCandless
On Thu, Nov 7, 2013 at 12:18 PM, Ravikumar Govindarajan wrote: > Thanks Mike. > > If you look at my impl, I am using the getCoreCacheKey() only, but keyed > on a ReaderClosedListener and purging it onClose(). When NRT does reopens, > will it invoke the onClose() method for the expired-reader?. OK

Re: IndexReader close listeners and NRT

2013-11-07 Thread Ravikumar Govindarajan
Thanks Mike. If you look at my impl, I am using the getCoreCacheKey() only, but keyed on a ReaderClosedListener and purging it onClose(). When NRT does reopens, will it invoke the onClose() method for the expired-reader?. I saw that FieldCacheImpl is using a CoreClosedListener, whereas I am using

Re: Question about SearcherManager.maybeReopen() method.

2013-11-07 Thread Alexei Morgado
We are not copying index files from one index to another. Will try to explain: 1 - We have a unix script that removes the old physical index and create a new one several times a day from the database. 2 - The SearcherManager call maybeReopen in a separate thread from the main application every f

Re: IndexReader close listeners and NRT

2013-11-07 Thread Michael McCandless
Hi, a few comments on quickly looking at the code... It's sort of strange, inside the Weight.scorer() method, to go and build an IndexSearcher and run a whole new search, if the cache entry is missing. Could you instead just do a top-level search, which then populates the cache per-segment? Also

Re: What is the best way to aggregate scores for sets of documents?

2013-11-07 Thread Alan Burlison
On 07/11/2013 13:17, Manuel Amoabeng wrote: Sounds good, but wouldn't the aggregated scores of documents consisting of many sub-documents potentially be greater than the scores of docs with very few sub-documents even if the overall content is equal? I don't pretend to understand Lucene scori

Re: What is the best way to aggregate scores for sets of documents?

2013-11-07 Thread Manuel Amoabeng
Sounds good, but wouldn't the aggregated scores of documents consisting of many sub-documents potentially be greater than the scores of docs with very few sub-documents even if the overall content is equal? Thanks, Manuel On 07.11.2013, at 14:08, Alan Burlison wrote: > On 07/11/2013 10:5

Re: What is the best way to aggregate scores for sets of documents?

2013-11-07 Thread Manuel Amoabeng
Hmm, I am not sure about how it could be achieved but my task is to produce a similar score for articles with similar content but different distribution of this content to text objects. Maybe something like creating a temporary document from the text objects and computing its score instead of ju

Re: What is the best way to aggregate scores for sets of documents?

2013-11-07 Thread Alan Burlison
On 07/11/2013 10:59, Manuel Amoabeng wrote: Is there are a way to aggregate the scores for logically connected ScoreDocs so that the result would be similar to the score a single document containing all matched content would have gotten? I did something similar by just post-processing the quer

Re: IndexReader close listeners and NRT

2013-11-07 Thread Ravikumar Govindarajan
Thanks Mike. Can you help me out with one more question? I have a sample impl as below, where I am adding a ReaderClosedListener to purge the BitSet. When using NRT with applyAllDeletes, old-reader will get closed and new-reader will open. In such a case, will the below impl-cache also be purged

Re: What is the best way to aggregate scores for sets of documents?

2013-11-07 Thread Michael McCandless
Alas, the scoring is very simple: just what you see in the ScoreMode enum. But this is something that we should fix, e.g. we should at least open up a method so the app can do its own score aggregation. What scoring/model do you have in mind? Mike McCandless http://blog.mikemccandless.com On

Re: What is the best way to aggregate scores for sets of documents?

2013-11-07 Thread Manuel Amoabeng
Thanks for pointing me to the lucene-join module. Does the ToParentBlockJoinQuery produce the scores in a more sophisticated way than the ScoreMode enum suggests? Actually finding the related entities is not my problem, I am only having trouble to produce scores consistent with the overall conte

Re: What is the best way to aggregate scores for sets of documents?

2013-11-07 Thread Michael McCandless
Maybe the join module fits here? For example you can join "up" to a single parent from multiple child hits. I described one of the options (now called ToParentBlockJoinQuery) here: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html but there is also query-time joining n

What is the best way to aggregate scores for sets of documents?

2013-11-07 Thread Manuel Amoabeng
Hello everybody, I am currently working on an index where the documents only represent parts of the entities that should be searchable: We have text objects indexed as independent documents but actually want to find articles the text objects are placed on. We also need to provide an indication

Re: Question about SearcherManager.maybeReopen() method.

2013-11-07 Thread Michael McCandless
It sounds like you are somehow copying over index files from one index to another? You shouldn't do that; use IW.addIndexes instead. Or maybe give a bigger picture of how your application works with Lucene? Mike McCandless http://blog.mikemccandless.com On Wed, Nov 6, 2013 at 6:46 PM, Alexei

Re: IndexReader close listeners and NRT

2013-11-07 Thread Michael McCandless
You need to call .getCoreCacheKey() on each of the sub-readers (returned by IndexReader.leaves()), to play well with NRT. Typically you'd do so in a context that already sees each leaf, like a custom Filter or a Collector. Mike McCandless http://blog.mikemccandless.com On Thu, Nov 7, 2013 at 1

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

2013-11-07 Thread Michael McCandless
OK, so CheckIndex found that the del files for 3 segments could not be found, e.g. it wanted to open _24xf_9l.del (yet it's _24xf_9k.del that's actually there). I wonder why CheckIndex doesn't report the exc you saw in flush, with that way-future segment (_33gg.cfs): that's weird. But ... I suspe

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

2013-11-07 Thread Gili Nachum
Thanks Mike and Uwe. I already reindexed in production, my goal is to get to the root cause to make sure it doesn't happen again. Will remove the flush(). No idea why it's there. Attaching checkIndex.Main() output (why did I bother writing my own output :#) *Output:* Opening index @ C:\\customers\