See below... On Thu, May 22, 2008 at 5:44 AM, lucene user <[EMAIL PROTECTED]> wrote:
> We have a requirement to inform users on a regular basis of new material on > which they have expressed interest. How are we to know what is "new" from > the point of view of a particular user? Our idea is to tag each new item in > some way (perhaps a date/time stamp in the lucene index indicating when the > new document was indexed) and remember when the last time we sent out an > alert to that user. > How should we tag the documents? With a date/time of indexing stamp? An > incrementing batch import ID number? Does it matter much? > > *I am reminded that ranges of dates and numbers, (as well as wild cards) > are > evaluated as if they were a large OR query covering all the values that > exist in the index. Lucene only finds exact matches - it does not do > comparisons. This means that ranges with lots of different values in them > are bad - and can actually crash with a 'too many clauses' exception if > there are enough distinct values to push the number of clauses over 1024. > Do > I understand this correctly?* Yes, but.. I think ConstantScoreRangeQuery is your friend here. From the doc... "It does not have an upper bound on the number of clauses covered in the range. " The whole expansion thing was designed to work well with scoring as I understand. In cases like this I don't think you care about how the tag contributes to the score, it's just yes/no. You could create your own Filter instead, but why bother? > > *How do we handle existing documents which do not have such a new field > associated with them? Can we provide a default value for the existing > documents? * Not that I know of. You can certainly test if each document you are returning has the field. Document.get(<field>) returns null if the doc doesn't have the field so that should fix you up. But there's no way I know of to assign a default for a non-existent field. > > > I did not find the place in the Lucene Documentation where it explains what > you get when you try to retrieve or search on a field that does not exist > in > the document. I remember it not being a problem, but I couldn't find it. > How > do I do this? What should I read? > Searching on a field that doesn't exist just means that field isn't part of the scoring. So if you have a search that includes the field as an AND clause, you'll get no matches. Imagine that each document *did* have such a field with a value that never matched any value you search on and you'll get the idea. So, +field1:stuff +nonexistentfield:morestuff will never turn any document that doesn't have any value in nonexistentfield Best Erick > > Thanks! >