Hi Andrej!
I'm taking a look to fuzzy signatures for near duplicate detection and
and I have seen your TextProfileSignature. The question is: If I index
the documents with their text signature, is there a way to filter near
duplicates at search time without comparing each document with all oth
Hi Karl!
I'm interested in near duplicate detection based on termFreqVectos. Now
I'm comparing all documents with each other (calculating the angle)...
Is there a way to avoid that?
Thanks!
Beto
karl wettin wrote:
17 okt 2006 kl. 17.54 skrev Find Me:
How to eliminate near duplicates from
Hi, I'm with the transaction problem too: I have Documents which are
represented by a Business Object (persisted in a DB with an ORM),
indexed with Lucene and finally stored in the file system. So it's very
difficult to maintain the consistency in an error scenario.
The main problem is that if