globally unique field

Mike Baranczak Tue, 19 Apr 2005 16:44:53 -0700

First of all, a big thanks to all the Lucene hackers - I've only been using your product for a couple of weeks, and I've been very impressed by what I've seen.

Here's my question: I have an index with a little over 3 million documents in it, with more on the way. Each document has an "URL" field (which is not indexed). I want to guarantee that each URL is unique; that is, when I'm adding a new document, I have to check if another existing document has the same value for the URL field. What's the best way to do it? I can think of two possible approaches:

1 - Open an IndexReader and iterate over all the Documents that it contains, checking the value of the "URL" field for each Document. This seems a little inefficient, since I only care about one field, and I don't want to have to retrieve all of the fields.

2 - Rebuild the index such that the URL field is indexed. Then, I could just do a normal search for the value of the URL. But since the URL field will never be searched under any other circumstances, this seems like kind of a waste of disk space.

I'm sure somebody else has had to do something like this before. Is there a better way to do it than what I've described above? If not, then which of the two approaches will give me the best results?

Thanks in advance.

-MB


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

globally unique field

Reply via email to