Problem solved, I forgot to specify the id field of the lucene index.

-----Ursprüngliche Mitteilung-----
Von: Stefan Kreuzer <[email protected]>
An: user <[email protected]>
Verschickt: Sa, 9 Mrz 2013 6:37 pm
Betreff: Need help with ClusterLabels / Log Likelihood Ratio (Mahout 0.7)


Hi,

I have a problem with cluster labels. I try to use
ClusterLabeld.getLabels(), but In
ClusterLabels.scoreDocumentFrequencies i run into situations where k22
becomes negative, yieldung an exception.
The numbers I get are:
corpusSize : 435
clusterSize: 181
outDF: 277
  => long k22 = corpusSize - clusterSize - outDF;
    becomes -23 but has to be at least zero.

I have no clue what mistake of mine is causing this, I use the same
Lucene analyzer for creating the mahout sequence and for the lucene
index that is used as a parameter in the ClasterLabels constructor.

Regards
Stefan



Reply via email to