Re: Document Term matrix

2014-11-11 Thread Ahmet Arslan
Hi, Mahout and Carrot2 can cluster the documents from lucene index. ahmet On Tuesday, November 11, 2014 10:37 PM, Elshaimaa Ali wrote: Hi All, I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to extract a Document-term matrix, and Document Document similarity matri

Re: Document Term matrix

2014-11-11 Thread Paul Libbrecht
The project semanticvectors might be doing what you are looking for. paul On 11 nov. 2014, at 22:37, parnab kumar wrote: > hi, > > While indexing the documents , store the Term Vectors for the content > field. Now for each document you will have an array of terms and their > corresponding fre

Re: Document Term matrix

2014-11-11 Thread parnab kumar
hi, While indexing the documents , store the Term Vectors for the content field. Now for each document you will have an array of terms and their corresponding frequency in the document. Using the Index Reader you can retrieve this term vectors. Similarity between two documents can be computed as