RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-03-08 Thread saisantoshi
Could someone please comment on the above? Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4045855.html Sent from the Lucene - Java Users mailing list archive at N

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-03-06 Thread saisantoshi
Thanks for the response and really appreciate your help. I have read the documentation but could not get it in the first read as I was new to Lucene. I have changed it to AtomicReader and it seems to be working fine. One last clarification is do we also need to use AtomicReader for the following b

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-03-01 Thread Michael Sokolov
On 03/01/2013 07:56 AM, Uwe Schindler wrote: The slowdown happens not on making the doc ids absolute (it is just an addition), the slowdown appears when you retrieve the stored fields on the top-level reader (because the composite top-level reader has to do a binary search in the reader tree t

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-03-01 Thread Uwe Schindler
che.org > Cc: Uwe Schindler > Subject: Re: TopDocCollector vs TopScoreDocCollector (semantics changed in > 4.0, not backward comptabile) > > On 2/28/2013 5:05 PM, Uwe Schindler wrote: > > ... Collector instead of HitCollector (like your ancient Lucene from 2.4), > >

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-03-01 Thread Michael Sokolov
On 2/28/2013 5:05 PM, Uwe Schindler wrote: ... Collector instead of HitCollector (like your ancient Lucene from 2.4), you have to respect the new semantics that are *different* to old HitCollector. Collector works with low-level atomic readers (also in Lucene 3.x), the calls to the "collect(in

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-28 Thread Uwe Schindler
Message- > From: saisantoshi [mailto:saisantosh...@gmail.com] > Sent: Thursday, February 28, 2013 10:55 PM > To: java-user@lucene.apache.org > Subject: RE: TopDocCollector vs TopScoreDocCollector (semantics changed in > 4.0, not backward comptabile) > > Thanks a lot. Really a

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-28 Thread saisantoshi
Thanks a lot. Really appreciate your help here. I have read through the document and understand that the IndexReader uses sub readers (to look into the index files) and AtomicReader does not. But how does this affect from the search stand point of view. I think search results should be consistent

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-28 Thread Uwe Schindler
Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: saisantoshi [mailto:saisantosh...@gmail.com] > Sent: Thursday, February 28, 2013 7:26 PM > To: java-user@lucene.apache.org > Subject: RE

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-28 Thread saisantoshi
Could someone please comment on the above code snippet ? Also, one observation is that our search results are not consistent if we are using* IndexReader vs AtomicReader?* Could this be a problem? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread saisantoshi
Here is how I am using it: public class MyCollector extends PositiveScoresOnlyCollector { private IndexReader indexReader; public MyCollector(IndexReader indexReader, PositiveScoresOnlyCollector topScore) { super(topScore); this.indexReader = indexReader;

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread Uwe Schindler
age- > From: saisantoshi [mailto:saisantosh...@gmail.com] > Sent: Wednesday, February 27, 2013 11:51 PM > To: java-user@lucene.apache.org > Subject: RE: TopDocCollector vs TopScoreDocCollector (semantics changed in > 4.0, not backward comptabile) > > Thanks. Is there any issu

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread saisantoshi
Thanks. Is there any issue the way we are calling the indexReader.getDocument(doc)? Not sure how do I get an AtomicReaderConext in the following below method? Any pointers on how do I get that instance is appreciated? public void collect(int doc) throws IOException { // ADD YOUR CUSTOM LOGIC

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread Uwe Schindler
.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: saisantoshi [mailto:saisantosh...@gmail.com] > Sent: Wednesday, February 27, 2013 10:39 PM > To: java-user@lucene.apache.org > Subject: Re: TopDocCollector vs TopSc

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread saisantoshi
I want to get the Document in the following below code and thats why I need an indexReader public void collect(int doc) throws IOException { // ADD YOUR CUSTOM LOGIC HERE *Document doc = indexReader.document(doc)* delegate.collect(doc); } But this seems to be the problem as the in

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-25 Thread saisantoshi
I am not looking for negative scores and want to skip it. Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036378.html Sent from the Lucene - Java Users mailing li

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-25 Thread Simon Willnauer
On Fri, Jan 25, 2013 at 3:29 PM, saisantoshi wrote: > Thanks a lot. If we want to wrap TopScoreDocCollector into > PositiveScoresOnlyCollector. Can we do that? > I need only positive scores and I dont think topscore collector can handle > by itself right? > I guess so! But how do you get neg. sco

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-25 Thread saisantoshi
Thanks a lot. If we want to wrap TopScoreDocCollector into PositiveScoresOnlyCollector. Can we do that? I need only positive scores and I dont think topscore collector can handle by itself right? Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-v

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-25 Thread Simon Willnauer
hey, you don't need to set the indexreader in the constructor. An AtomicReader is passed in for each segment to Collector#setNextReader(AtomicReaderContext) If you want to use a given collector and extend it with some custom code in collect I would likely write a delegate Collector like this: pub

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-24 Thread saisantoshi
Can someone please help us here to validate the above? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036093.html Sent from the Lucene - Java Users mailing list

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-23 Thread saisantoshi
Here is the way I implemented a collector class. Appreciate if you could let me know of any issues.. public class MyCollector extends PositiveScoresOnlyCollector { private IndexReader indexReader; public MyCollector (IndexReader indexReader,PositiveScoresOnlyCollector topScor

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-23 Thread saisantoshi
I am sorry but I am confused looking at the change logs and the enhancements done. Since we are jumping from 2.4 - 4.0. Could you please point me to any example code that extends one of the new collectors.. that would help a lot or it would be great if you could give some pointers on how we can mo

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-23 Thread Uwe Schindler
This has been changed in Lucene 2.9, its nothing new in Lucene 4.0. Read the changes logs of Lucene 2.9/3.0, there is explained what you need to do. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From:

Re: TopDocCollector limits

2009-09-30 Thread Mark Miller
Way the heck better - Hits is horrible for that. It caches like 100 hits and then keeps searching when you exhaust the cache (been I while since I've looked at the exact numbers). Its horribly inefficient for checking every hit. Hits will end up using a Collector anyway - and then throw a speed tr

Re: TopDocCollector limits

2009-09-30 Thread Max Lynch
Thanks Mark that's exactly what I need. How does the performance of processing each document in the collect method of HitCollector compare to looping through the Hits in the deprecated Hits class? On Tue, Sep 29, 2009 at 7:40 PM, Mark Miller wrote: > Max Lynch wrote: > > Hi, > > I am developing

Re: TopDocCollector limits

2009-09-29 Thread Mark Miller
Max Lynch wrote: > Hi, > I am developing a search system that doesn't do pagination (searches are run > in the background and machine analyzed). However, TopDocCollector makes me > put a limit on how many results I want back. For my system, each result > found is important. How can I make it col

Re: TopDocCollector

2009-02-28 Thread Yonik Seeley
On Sat, Feb 28, 2009 at 7:51 AM, wrote: >> Solr has always allowed all scores through w/o screening out <=0 > > Why? Partially historical... due to some limitations in Lucene back when Solr was first written (like undesired score normalization), Solr interfaces with Lucene search at the hit coll

RE: TopDocCollector

2009-02-28 Thread spring
> > * How can a hit have a score of <=0? > > A function query, or a negative boost would do it. Ah ok. > Solr has always allowed all scores through w/o screening out <=0 Why? - To unsubscribe, e-mail: java-user-unsubscr...@lu

RE: TopDocCollector

2009-02-28 Thread spring
> That works fine, because hq.size() is still less than numHits. So > nomatter what, the first numHits hits will be added to the queue. > > > public void collect(int doc, float score) { > > 57 if (score > 0.0f) { > > 59 if (hq.size() < numHits || score >= minScore) { Oh damned... it'

Re: TopDocCollector

2009-02-27 Thread Yonik Seeley
On Fri, Feb 27, 2009 at 6:43 AM, wrote: > Looking into TopDocCollector code, I have some questions: > > * How can a hit have a score of <=0? A function query, or a negative boost would do it. Solr has always allowed all scores through w/o screening out <=0 -Yonik http://www.lucidimagination.co

Re: TopDocCollector

2009-02-27 Thread Michael McCandless
wrote: Looking into TopDocCollector code, I have some questions: * How can a hit have a score of <=0? I'm not sure... * What happens if the first hit has the highest score of all hits? It seems that topDocs whould then contain only this doc!? That works fine, because hq.size() is sti

Re: TopDocCollector vs Hits: TopDocCollector slowing....

2009-02-18 Thread AlexElba
Grant Ingersoll-6 wrote: > > I presume they are both now slower, right? Otherwise you wouldn't > mind the speedup on the bigger one. Hits did caching and prefetched > things, which has it's tradeoffs. Can you describe how you were > measuring the queries? How many results were you get

Re: TopDocCollector vs Hits inquiry

2009-02-05 Thread Jay Malaluan
ote: > >> >> Hi, >> >> As I was reading the post "Re: TopDocCollector vs Hits: >> TopDocCollector >> slowing", I just got curious on how he explained his change from >> Hits to >> TopDocCollector. I'm assuming that the Hits

Re: TopDocCollector vs Hits inquiry

2009-02-05 Thread Grant Ingersoll
http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query,%20org.apache.lucene.search.HitCollector) The TopDocCollector is a HitCollector. On Feb 4, 2009, at 10:34 PM, Jay Malaluan wrote: Hi, As I was reading the post &qu

Re: TopDocCollector vs Hits inquiry

2009-02-04 Thread Jay Malaluan
Hi, As I was reading the post "Re: TopDocCollector vs Hits: TopDocCollector slowing", I just got curious on how he explained his change from Hits to TopDocCollector. I'm assuming that the Hits is returned from a call of: Searcher searcher = new Searcher(); searcher.search(x

Re: TopDocCollector vs Hits: TopDocCollector slowing....

2009-02-04 Thread Grant Ingersoll
I presume they are both now slower, right? Otherwise you wouldn't mind the speedup on the bigger one. Hits did caching and prefetched things, which has it's tradeoffs. Can you describe how you were measuring the queries? How many results were you getting? -Grant On Feb 3, 2009, at 8:

Re: TopDocCollector & Paging

2008-09-17 Thread Chris Hostetter
: I know in applications where we search for a words or phrases and expect : the result sorted by relevance, TopDocCollector would work like a dream. : But what about scenario where the result needs to be sorted : chronologically or by some kind of metadata. These two methods are available, and

Re: TopDocCollector & Paging

2008-09-17 Thread Grant Ingersoll
On Sep 17, 2008, at 6:53 PM, Dino Korah wrote: Thanks Grant.. Please see my comments/response below. 2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]> On Sep 17, 2008, at 4:39 PM, Dino Korah wrote: I know in applications where we search for a words or phrases and expect the result sorted by

Re: TopDocCollector & Paging

2008-09-17 Thread Dino Korah
Thanks Grant.. Please see my comments/response below. 2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]> > > On Sep 17, 2008, at 4:39 PM, Dino Korah wrote: > > I know in applications where we search for a words or phrases and expect >> the >> result sorted by relevance, TopDocCollector would work lik

Re: TopDocCollector & Paging

2008-09-17 Thread Grant Ingersoll
On Sep 17, 2008, at 4:39 PM, Dino Korah wrote: I know in applications where we search for a words or phrases and expect the result sorted by relevance, TopDocCollector would work like a dream. But what about scenario where the result needs to be sorted chronologically or by some kind of me

Re: TopDocCollector & Paging

2008-09-17 Thread Dino Korah
I know in applications where we search for a words or phrases and expect the result sorted by relevance, TopDocCollector would work like a dream. But what about scenario where the result needs to be sorted chronologically or by some kind of metadata. A very common application would be email applic

Re: TopDocCollector & Paging

2008-09-17 Thread Grant Ingersoll
On Sep 17, 2008, at 11:51 AM, Cam Bazz wrote: And how about queries that need starting position, like hits between 100 and 200? could we pass something to the collector that will count between 0 to 100 and then get the next 100 records? The collector uses a Priority Queue to store doc ids a

Re: TopDocCollector & Paging

2008-09-17 Thread Cam Bazz
And how about queries that need starting position, like hits between 100 and 200? could we pass something to the collector that will count between 0 to 100 and then get the next 100 records? Best. On Wed, Sep 17, 2008 at 5:16 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > Doesn't TopDocCollecto

Re: TopDocCollector & Paging

2008-09-17 Thread Erick Erickson
Doesn't TopDocCollector have a getTotalHits method? Remember that in order to get the top N documents, a all documents must be examined. I believe that the numHits parameter passed to the constructor just limits the number of hits stored in (and thus the size) of the TopDocs object Best Erick