are working on (Near) Duplicate Detection. I think
> the
> > > work is in Solr's JIRA, but some of it might be applicable to Lucene.
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > &
te Detection. I think the
> > work is in Solr's JIRA, but some of it might be applicable to Lucene.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message ----
>
- Nutch
>
>
>
> - Original Message
> > From: Chris Lu
> > To: "java-user@lucene.apache.org"
> > Sent: Monday, December 29, 2008 4:55:14 AM
> > Subject: duplication checking while indexing
> >
> > I am wondering whether there is
To: "java-user@lucene.apache.org"
> Sent: Monday, December 29, 2008 4:55:14 AM
> Subject: duplication checking while indexing
>
> I am wondering whether there is an easy way to avoid duplication while
> indexing, just using the index being created, without creating other data
>
I am wondering whether there is an easy way to avoid duplication while
indexing, just using the index being created, without creating other data
structures.
In some cases, the incoming document list can have duplicates. For example,
when creating spell checking indexes for phrases. Each phrase is o