Re: Using Lucene/Solr for Plagiarism detection

2010-12-30 Thread Lance Norskog
The MoreLikeThis feature may be exactly what you want. Try it out. On Thu, Dec 30, 2010 at 8:28 AM, Amel Fraisse wrote: > Hello, > > No I'm not using cosine similarity metrics. > > > 2010/12/30 Shashi Kant > >> Have you considered using document similarity metrics such as Cosine >> Similarity? >

Re: Comment in query-parser?

2010-12-30 Thread Federico Fissore
Paul Libbrecht, il 30/12/2010 20:15, ha scritto: [...] Is there anything in the query parser that would be considered ignored till the end of the line? Currently I'm using a fancy field but it introduces a useless query. I would make a CommentFilter and put it in an Analyzer I would give to

Comment in query-parser?

2010-12-30 Thread Paul Libbrecht
I'm more and more involved into preparing dedicated pages that list resources of our servers according to an elaborate query I received in a human description and "implement" as a query-parser query. Doing this I regularly use "indexed-doc" views. The implementation is thus a query that could

Re: Using Lucene/Solr for Plagiarism detection

2010-12-30 Thread Amel Fraisse
Hello, No I'm not using cosine similarity metrics. 2010/12/30 Shashi Kant > Have you considered using document similarity metrics such as Cosine > Similarity? > > > On Thu, Dec 30, 2010 at 6:05 AM, Amel Fraisse > wrote: > > Hello, > > > > I am using Lucene for plagiarism detection. > > > > Th

Re: Using Lucene/Solr for Plagiarism detection

2010-12-30 Thread Shashi Kant
Have you considered using document similarity metrics such as Cosine Similarity? On Thu, Dec 30, 2010 at 6:05 AM, Amel Fraisse wrote: > Hello, > > I am using Lucene for plagiarism detection. > > The goal is that: when I have a new document, I will check on the solr index > if there is a document

Re: SpanNearQuery doesn't return document if the same word within query is repeated

2010-12-30 Thread viruslviv
Sorry, An error in post with query, correct one is (the code has correct version): (intend within 3 of messag) within 5 of messag within 3 of address. -- View this message in context: http://lucene.472066.n3.nabble.com/SpanNearQuery-doesn-t-return-document-if-the-same-word-within-query-is-re

SpanNearQuery doesn't return document if the same word within query is repeated

2010-12-30 Thread viruslviv
Hello Lucene community! I am working with Solr/Lucene tool near half of year, and faced with interesting issue with SpanNearQuery queries. Consider we have following text within document (you can find whole document text below): "intended recipient of this message or if this message has been a

Using Lucene/Solr for Plagiarism detection

2010-12-30 Thread Amel Fraisse
Hello, I am using Lucene for plagiarism detection. The goal is that: when I have a new document, I will check on the solr index if there is a document that contain some common chunk. So to compute similarity between the query and a source document I would use this formula : Score (suspicious do