Re: Document Term matrix

2014-11-11 Thread Ahmet Arslan
Hi, Mahout and Carrot2 can cluster the documents from lucene index. ahmet On Tuesday, November 11, 2014 10:37 PM, Elshaimaa Ali wrote: Hi All, I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to extract a Document-term matrix, and Document Document similarity

Re: Document Term matrix

2014-11-11 Thread Paul Libbrecht
e idf > of the term with the frequency to re weight the vectors. > > Thanks, > Parnab > > On Tue, Nov 11, 2014 at 8:36 PM, Elshaimaa Ali > wrote: > >> Hi All, >> I have a Lucene index built with Lucene 4.9 for 584 text documents, I need >> to extract a D

Re: Document Term matrix

2014-11-11 Thread parnab kumar
Lucene index built with Lucene 4.9 for 584 text documents, I need > to extract a Document-term matrix, and Document Document similarity matrix > in-order to use it to cluster the documents. My questions:1- How can I > extract the matrix and compute the similarity between documents in >

Document Term matrix

2014-11-11 Thread Elshaimaa Ali
Hi All, I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to extract a Document-term matrix, and Document Document similarity matrix in-order to use it to cluster the documents. My questions:1- How can I extract the matrix and compute the similarity between documents in

Re: How to get Term Weights (document term matrix)?

2006-11-04 Thread Soeren Pekrul
Reader.norms(String field). The usage of TermQuery in my previous example is a simplification. The documents of my collection have some fields like title, abstract or keywords. The term weights in my document term matrix should include all fields of a document for a word (token). So I used in rea

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Chris Hostetter
: searching for the term, iterating the result documents and using the : score as my term weight for the document term matrix: ...okay, so it sounds like your defining term weight of a doc/term to be the score of that document when searching for that term. You really, *REALLY* don't wnat to be

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Soeren Pekrul
ng for the term, iterating the result documents and using the score as my term weight for the document term matrix: TermEnum terms=indexreader.terms(); while(terms.next()) { Term term=terms.term(); // write the term to the external document term matrix Hits hits=indexsearcher.search(new Term

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Chris Hostetter
cene.apache.org : To: java-user@lucene.apache.org : Subject: How to get Term Weights (document term matrix)? : : Hello, : : I would like to extract and store the document term matrix externally. I : iterate the terms and the documents for each term: : TermEnum terms=IndexReader.terms(); : while(t

How to get Term Weights (document term matrix)?

2006-11-02 Thread Soeren Pekrul
Hello, I would like to extract and store the document term matrix externally. I iterate the terms and the documents for each term: TermEnum terms=IndexReader.terms(); while(terms.next()) { TermDocs docs=IndexReader.termDocs(terms.term()); while(docs.next