Hi,
Mahout and Carrot2 can cluster the documents from lucene index.
ahmet
On Tuesday, November 11, 2014 10:37 PM, Elshaimaa Ali
wrote:
Hi All,
I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to
extract a Document-term matrix, and Document Document similarity
e idf
> of the term with the frequency to re weight the vectors.
>
> Thanks,
> Parnab
>
> On Tue, Nov 11, 2014 at 8:36 PM, Elshaimaa Ali
> wrote:
>
>> Hi All,
>> I have a Lucene index built with Lucene 4.9 for 584 text documents, I need
>> to extract a D
Lucene index built with Lucene 4.9 for 584 text documents, I need
> to extract a Document-term matrix, and Document Document similarity matrix
> in-order to use it to cluster the documents. My questions:1- How can I
> extract the matrix and compute the similarity between documents in
>
Hi All,
I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to
extract a Document-term matrix, and Document Document similarity matrix
in-order to use it to cluster the documents. My questions:1- How can I extract
the matrix and compute the similarity between documents in
Reader.norms(String field).
The usage of TermQuery in my previous example is a simplification. The
documents of my collection have some fields like title, abstract or
keywords. The term weights in my document term matrix should include all
fields of a document for a word (token). So I used in rea
: searching for the term, iterating the result documents and using the
: score as my term weight for the document term matrix:
...okay, so it sounds like your defining term weight of a doc/term to be
the score of that document when searching for that term.
You really, *REALLY* don't wnat to be
ng for the term, iterating the result documents and using the
score as my term weight for the document term matrix:
TermEnum terms=indexreader.terms();
while(terms.next()) {
Term term=terms.term();
// write the term to the external document term matrix
Hits hits=indexsearcher.search(new Term
cene.apache.org
: To: java-user@lucene.apache.org
: Subject: How to get Term Weights (document term matrix)?
:
: Hello,
:
: I would like to extract and store the document term matrix externally. I
: iterate the terms and the documents for each term:
: TermEnum terms=IndexReader.terms();
: while(t
Hello,
I would like to extract and store the document term matrix externally. I
iterate the terms and the documents for each term:
TermEnum terms=IndexReader.terms();
while(terms.next()) {
TermDocs docs=IndexReader.termDocs(terms.term());
while(docs.next