Re: duplication checking while indexing

2008-12-30 Thread Chris Lu
are working on (Near) Duplicate Detection. I think > the > > > work is in Solr's JIRA, but some of it might be applicable to Lucene. > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > &

Re: duplication checking while indexing

2008-12-29 Thread liu Ivan
te Detection. I think the > > work is in Solr's JIRA, but some of it might be applicable to Lucene. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message ---- >

Re: duplication checking while indexing

2008-12-29 Thread Chris Lu
- Nutch > > > > - Original Message > > From: Chris Lu > > To: "java-user@lucene.apache.org" > > Sent: Monday, December 29, 2008 4:55:14 AM > > Subject: duplication checking while indexing > > > > I am wondering whether there is

Re: duplication checking while indexing

2008-12-29 Thread Otis Gospodnetic
To: "java-user@lucene.apache.org" > Sent: Monday, December 29, 2008 4:55:14 AM > Subject: duplication checking while indexing > > I am wondering whether there is an easy way to avoid duplication while > indexing, just using the index being created, without creating other data >

duplication checking while indexing

2008-12-29 Thread Chris Lu
I am wondering whether there is an easy way to avoid duplication while indexing, just using the index being created, without creating other data structures. In some cases, the incoming document list can have duplicates. For example, when creating spell checking indexes for phrases. Each phrase is o