hi, thats exactly what i did :) works perfectly
thanks
_gk
- Original Message -
From: "Chris Hostetter" <[EMAIL PROTECTED]>
To:
Sent: Monday, January 30, 2006 5:56 AM
Subject: Re: deleting duplicate documents from my index
: Hi, im trying to delete duplicate d
: Hi, im trying to delete duplicate documents from my index, the unique
: indentifier is the documents url (aka field "url").
:
: my initial thought of how to acomplish this is to open the index via a
: reader and sort them by the documents url and then iterate through them
: looking for a match w
One way to do this (depending on your system and index size) is to remove
and add every url you find. This would ensure that every document in the
index is unique. No need to worry about sorting and iteration and doc_ids
and the like.
It rebuilds your entire index, but if you have a duplication
Hi, im trying to delete duplicate documents from my index, the unique
indentifier is the documents url (aka field "url").
my initial thought of how to acomplish this is to open the index via a reader
and sort them by the documents url and then iterate through them looking for a
match with the c