We have a requirement to inform users on a regular basis of new material on which they have expressed interest. How are we to know what is "new" from the point of view of a particular user? Our idea is to tag each new item in some way (perhaps a date/time stamp in the lucene index indicating when the new document was indexed) and remember when the last time we sent out an alert to that user. How should we tag the documents? With a date/time of indexing stamp? An incrementing batch import ID number? Does it matter much?
*I am reminded that ranges of dates and numbers, (as well as wild cards) are evaluated as if they were a large OR query covering all the values that exist in the index. Lucene only finds exact matches - it does not do comparisons. This means that ranges with lots of different values in them are bad - and can actually crash with a 'too many clauses' exception if there are enough distinct values to push the number of clauses over 1024. Do I understand this correctly?* *How do we handle existing documents which do not have such a new field associated with them? Can we provide a default value for the existing documents? * I did not find the place in the Lucene Documentation where it explains what you get when you try to retrieve or search on a field that does not exist in the document. I remember it not being a problem, but I couldn't find it. How do I do this? What should I read? Thanks!