Re: Duplicate values in search

2016-01-04 Thread Ivan Brusic
63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Ivan Brusic [mailto:i...@brusic.com] > > Sent: Thursday, December 31, 2015 3:18 AM > > To: java-user@lucene.apache.org > > Subject: Re: Duplicate values in

RE: Duplicate values in search

2015-12-31 Thread Uwe Schindler
day, December 31, 2015 3:18 AM > To: java-user@lucene.apache.org > Subject: Re: Duplicate values in search > > I potentially found the issue, but I am wondering why the code worked in > the first place. Did the contract for the scorer change with Lucene 5? > > The issue w

Re: Duplicate values in search

2015-12-30 Thread Ivan Brusic
To answer partially my question, one key difference is in DefaultBulkScorer: Lucene 4.10 public boolean score(Collector collector, int max) throws IOException { ... if (max == DocIdSetIterator.NO_MORE_DOCS) { scoreAll(collector, scorer); return false; }

Re: Duplicate values in search

2015-12-30 Thread Ivan Brusic
I potentially found the issue, but I am wondering why the code worked in the first place. Did the contract for the scorer change with Lucene 5? The issue was that underneath, each sub scorer had a posting enum and the initial document was not consumed on the first pass. Inside the DefaultBulkScor

Re: Duplicate values in search

2015-12-29 Thread Ivan Brusic
Thanks Adrien. I added the BaseScorer to the gist, but I was hoping to achieve was which direction I should go into to debug this issue. I was not focusing on the scorers since I did not need to upgrade them and I actually do not think I ever wrote my one Scorer in Lucene. Taking the next few days

Re: Duplicate values in search

2015-12-28 Thread Adrien Grand
Ivan, I can't find the BaseScorer class in the gist. Maybe you forgot to git add it? Le lun. 28 déc. 2015 à 23:07, Ivan Brusic a écrit : > Here is the complete code: > https://gist.github.com/brusic/e3018a2e403f5707fa3e > > The code is not originally mine, so I do not take responsibility. Once I

Re: Duplicate values in search

2015-12-28 Thread Ivan Brusic
Here is the complete code: https://gist.github.com/brusic/e3018a2e403f5707fa3e The code is not originally mine, so I do not take responsibility. Once I get things to perform correctly, I will do another pass with improvements. Much of the custom code needs to be re-thought. The scorer is one clas

Re: Duplicate values in search

2015-12-28 Thread Adrien Grand
Hi Ivan, It looks like your scorer is emitting the same document twice. Maybe you could try to use AssertingIndexSearcher in your test case, this is the kind of things that it should catch. The only related Lucene 5 change that I can think of is that Lucene now requires docs to be collected in or

Duplicate values in search

2015-12-28 Thread Ivan Brusic
I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of custom collectors, analyzers, queries, etc.. I have migrated other code bases from Lucene before (2->3, 3->4) and I always had one issue I could not eyeball! When using a custom query, I get the same document twice in the result set.