Hi,
I have a problem with cluster labels. I try to use
ClusterLabeld.getLabels(), but In
ClusterLabels.scoreDocumentFrequencies i run into situations where k22
becomes negative, yieldung an exception.
The numbers I get are:
corpusSize : 435
clusterSize: 181
outDF: 277
=> long k22 = corpusSize - clusterSize - outDF;
becomes -23 but has to be at least zero.
I have no clue what mistake of mine is causing this, I use the same
Lucene analyzer for creating the mahout sequence and for the lucene
index that is used as a parameter in the ClasterLabels constructor.
Regards
Stefan