Problem solved, I forgot to specify the id field of the lucene index.
-----Ursprüngliche Mitteilung-----
Von: Stefan Kreuzer <[email protected]>
An: user <[email protected]>
Verschickt: Sa, 9 Mrz 2013 6:37 pm
Betreff: Need help with ClusterLabels / Log Likelihood Ratio (Mahout
0.7)
Hi,
I have a problem with cluster labels. I try to use
ClusterLabeld.getLabels(), but In
ClusterLabels.scoreDocumentFrequencies i run into situations where k22
becomes negative, yieldung an exception.
The numbers I get are:
corpusSize : 435
clusterSize: 181
outDF: 277
=> long k22 = corpusSize - clusterSize - outDF;
becomes -23 but has to be at least zero.
I have no clue what mistake of mine is causing this, I use the same
Lucene analyzer for creating the mahout sequence and for the lucene
index that is used as a parameter in the ClasterLabels constructor.
Regards
Stefan