Here's my question: I have an index with a little over 3 million documents in it, with more on the way. Each document has an "URL" field (which is not indexed). I want to guarantee that each URL is unique; that is, when I'm adding a new document, I have to check if another existing document has the same value for the URL field. What's the best way to do it? I can think of two possible approaches:
1 - Open an IndexReader and iterate over all the Documents that it contains, checking the value of the "URL" field for each Document. This seems a little inefficient, since I only care about one field, and I don't want to have to retrieve all of the fields.
2 - Rebuild the index such that the URL field is indexed. Then, I could just do a normal search for the value of the URL. But since the URL field will never be searched under any other circumstances, this seems like kind of a waste of disk space.
I'm sure somebody else has had to do something like this before. Is there a better way to do it than what I've described above? If not, then which of the two approaches will give me the best results?
Thanks in advance.
-MB
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]