Re: near duplicates

2006-10-24 Thread Beto Siless
Hi Andrej! I'm taking a look to fuzzy signatures for near duplicate detection and and I have seen your TextProfileSignature. The question is: If I index the documents with their text signature, is there a way to filter near duplicates at search time without comparing each document with all oth

Re: near duplicates

2006-10-24 Thread Beto Siless
Hi Karl! I'm interested in near duplicate detection based on termFreqVectos. Now I'm comparing all documents with each other (calculating the angle)... Is there a way to avoid that? Thanks! Beto karl wettin wrote: 17 okt 2006 kl. 17.54 skrev Find Me: How to eliminate near duplicates from

Re: Lucene & Transactional semantics

2005-11-17 Thread Beto Siless
Hi, I'm with the transaction problem too: I have Documents which are represented by a Business Object (persisted in a DB with an ORM), indexed with Lucene and finally stored in the file system. So it's very difficult to maintain the consistency in an error scenario. The main problem is that if