Re: Doc-Doc Similarity Matrix Construction

2009-06-29 Thread Amir Hossein Jadidinejad
It's exactly my question: http://www.mail-archive.com/lucene-u...@jakarta.apache.org/msg04915.html --- On Mon, 6/29/09, Amir Hossein Jadidinejad wrote: From: Amir Hossein Jadidinejad Subject: Doc-Doc Similarity Matrix Construction To: java-user@lucene.apache.org Date: Monday, June 29, 20

Re: Doc-Doc Similarity Matrix Construction

2009-06-29 Thread Mark Harwood
See MoreLikeThis in the contrib/queries folder. It optimizes the speed of similarity comparisons by taking the most significant words only from a document as search terms. On 29 Jun 2009, at 20:14, Amir Hossein Jadidinejad wrote: Hi, It's my first experiment with Lucene. Please help me.

Doc-Doc Similarity Matrix Construction

2009-06-29 Thread Amir Hossein Jadidinejad
Hi, It's my first experiment with Lucene. Please help me. I'm going to index a set of documents and create a feature vector for each of them. This vector contains all terms belong to the document that weight using TFIDF. After that I want to compute the cosine similarity between all documents and