Have you considered using document similarity metrics such as Cosine Similarity?
On Thu, Dec 30, 2010 at 6:05 AM, Amel Fraisse <amel.frai...@gmail.com> wrote: > Hello, > > I am using Lucene for plagiarism detection. > > The goal is that: when I have a new document, I will check on the solr index > if there is a document that contain some common chunk. > > So to compute similarity between the query and a source document I would use > this formula : > > Score (suspicious document, source document) = Number of common chunk > between source document and suspicious document / Number of total chunk in > the suspicious document. > > So I have to change the scoring formula in the Similarity class. > > How can I change the scoring formula? ( by customizing only the Similarity > class? or Scorer?) > > Do you have an Example of this use case? > > Thank for your help. > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org