Sorry I didn't explain myself well. The problem I try to address is the following: Think about the case where you have 100,000 documents indexed. Take word 'a' - if it appears in 80,000 documents, you want the score to take it into account. You want only to see how 20,000 documents are close to a query, and only 10,000 of these contain the word 'a'. 80,000 / 100,000 (the 'statistics' of the whole index) is much smaller than 10,000 / 20,000 (the 'statistics of only the group of documents). So it does affect the score if I use the whole index or just the documents I am interested in. It might be that the order of these desired documents will not change, but I don;t see how you can assure it since the idf value can be really different.
So, I want the documents for my query to be ranked *relative to each other*, AND NOT restricted to only the documents I care about. For that case, I need to use the filter, right? Its fine if I get the results in DocumentID - then I open these using IndexReader to get the fields I need. Could you please give me an example of how I creat the Filter that filters out a given list of ids? Thanks! Liat 2009/5/18 Erick Erickson <erickerick...@gmail.com> > I'm still unclear what you want the statistics *for*. "statistics" > are pretty meaningless as far as I understand. The whole point > of scoring is to use various "statistics" to *rank* documents *for > a specific query*. You cannot, for instance, compare scores > between different queries in any meaningful way. > > If you're saying that you want the documents for your query to be > ranked *relative to each other*, but restricted to only the documents > you care about, then I think you need a HitCollector > because a Filter (last I knew) doesn't score documents therefore > won't order them. > > But asking if the statistics reflect the whole index just isn't making any > sense to me. If you're asking that question I suspect that there's > something about your problem space I don't understand and > you're not explaining simply enough for me to grasp <G>. > > So forget a Filter because you'll get the documents back in > (probably, but my memory is weak some days) document ID > order. Implement a HitCollector whose collect method only > sets bits for docs in your list. See: > > http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/HitCollector.html#collect(int,%20float) > > Best > Erick > > > On Sun, May 17, 2009 at 3:57 AM, liat oren <oren.l...@gmail.com> wrote: > > > Yes, this is what I need - I don't need to get the scores for the > documents > > that were filtered. > > The statistics I ment are idf(t) for example. > > I want these to include the whole index of course. > > It will include this info of all the index, right? > > > > if I have a list of ids that the query should look at, which Filter > should > > I > > use? > > > > Thanks a lot, > > Liat > > > > 2009/5/14 Erick Erickson <erickerick...@gmail.com> > > > > > Hmmm, come to think of it, if you pass the Filter to the search > I*think* > > > you > > > don't get scores for that clause, but you may want to > > > check it out... > > > > > > So I think you should think about implementing a HitCollector > > > and collect only the documents you care about. > > > > > > This is really very little extra work since all the documents have > > > to be evaluated anyway. > > > > > > I'm not sure what you mean by statistics for the whole index. I suspect > > > you're wondering if the scores reflect all the documents. But you don't > > > care because scores are not relevant between different queries, and > > > if they are calculated only within the query you're running, all the > > > documents returned have scores that rank them relative to each other. > > > > > > Best > > > Erick > > > > > > On Thu, May 14, 2009 at 9:16 AM, liat oren <oren.l...@gmail.com> > wrote: > > > > > > > Yes, I have a pre-defined list of documents that I care about. > > > > Then I can do the search on these, but it will take the statictics of > > the > > > > whole index, right? > > > > > > > > > > > > > > > > > > > > 2009/5/14 Erick Erickson <erickerick...@gmail.com> > > > > > > > > > I don't know if I'm understanding what you want, but if you havea > > > > > pre-defined list of documents, couldn't you form a Filter? Then > > > > > your results would only be the documents you care about. > > > > > > > > > > If this is irrelevant, perhaps you could explain a bit more about > > > > > the problem you're trying to solve. > > > > > > > > > > Best > > > > > Erick > > > > > > > > > > On Thu, May 14, 2009 at 5:03 AM, liat oren <oren.l...@gmail.com> > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I have a big index and I want to get for a specific search only > the > > > > > grades > > > > > > of a list of documents. > > > > > > Is there a better way to get this score than looping on all the > > > > reasults > > > > > > set? > > > > > > > > > > > > Thanks, > > > > > > Liat > > > > > > > > > > > > > > > > > > > > >