subject:"deleting duplicate documents from my index"

Re: deleting duplicate documents from my index

2006-01-30 Thread gekkokid

hi, thats exactly what i did :) works perfectly thanks _gk - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: Sent: Monday, January 30, 2006 5:56 AM Subject: Re: deleting duplicate documents from my index : Hi, im trying to delete duplicate d

Re: deleting duplicate documents from my index

2006-01-29 Thread Chris Hostetter

: Hi, im trying to delete duplicate documents from my index, the unique : indentifier is the documents url (aka field "url"). : : my initial thought of how to acomplish this is to open the index via a : reader and sort them by the documents url and then iterate through them : looking for a match w

Re: deleting duplicate documents from my index

2006-01-29 Thread Jeff Rodenburg

One way to do this (depending on your system and index size) is to remove and add every url you find. This would ensure that every document in the index is unique. No need to worry about sorting and iteration and doc_ids and the like. It rebuilds your entire index, but if you have a duplication

deleting duplicate documents from my index

2006-01-28 Thread gekkokid

Hi, im trying to delete duplicate documents from my index, the unique indentifier is the documents url (aka field "url"). my initial thought of how to acomplish this is to open the index via a reader and sort them by the documents url and then iterate through them looking for a match with the c