I bought your book :) Thanks, I will look into it. On Thu, Dec 4, 2008 at 6:12 PM, Erick Erickson <[EMAIL PROTECTED]>wrote:
> See the class in the docs or Lucene In Action for more > detail, but here's the short form..... > > A Filter is a bitset where each bit's ordinal position stands > for a document. I.e. bit 1 means doc id 1, bit 519 > represents document 519 etc. > > When you pass a filter to one of the search routines that accepts a Filter, > the search is restricted to ONLY those documents with the corresponding > bit set in the Filter. Which sounds like just what you want, but I may be > wrong. > > You construct a Filter by iterating over the terms you care about (see > TermDocs/TermEnum classes). In a nutshell you find the terms you > care about, and for each document that contains that term, set the > bit in your filter. This is actually much faster than you might think. All > this is provided for you in the two classes above, how you use them > depends (tm). Watch that you don't run past the term you care about, I > remember having that happen in one of those classes but the details > escape my aging memory. > > Hope that helps > Erick > > On Thu, Dec 4, 2008 at 4:20 PM, Ian Vink <[EMAIL PROTECTED]> wrote: > > > So, let me get this straight. :) > > > > A Query tells Lucene what to search for. Then a Filter tells lucene what? > > > > I think I'm missing understanding about what a Filter is for. > > > > Ian > > > > > > > > On Thu, Dec 4, 2008 at 9:36 AM, Erick Erickson <[EMAIL PROTECTED] > > >wrote: > > > > > It's generally a bad idea to iterate a Hits object. In fact, Hits > > > is deprecated in recent versions of Lucene. The underlying > > > problem is that the query is re-executed every 100 responses > > > or so. > > > > > > First suggestion, create a Filter by iterating over your > > > docid field and use that in your searches see > > > several of the Searcher.search variants. > > > > > > Second suggestion, use one of the collector classes rather than > > > Hits, e.g. TopDoc*, TopFieldDoc*, whichever suits. > > > > > > > > > Best > > > Erick > > > > > > On Thu, Dec 4, 2008 at 7:59 AM, Ian Vink <[EMAIL PROTECTED]> wrote: > > > > > > > I have documents with this simple schema in Lucene which I can not > > > change. > > > > docid: (int) > > > > contents: (text) > > > > > > > > The user is given a list of 10,000 documents in a tree which they > > select > > > to > > > > search, usually they select 5000 or so. > > > > > > > > I only want to search those 5000 documents. I have the 'id' fields. > > That > > > is > > > > all. > > > > > > > > I do this now: > > > > > > > > Get the 'Hits' for all documents. > > > > Loop through all Hits looking for any 'docid' that is in the 5000 > > > selected > > > > by the user > > > > Add found docs to a collection of found documents and return that to > > the > > > > UI. > > > > > > > > > > > > Is there a better way of doing this? > > > > > > > > > >