Dear all, I am facing troubles when retrieving the cluster probabilities of instances:
I am clustering instances using the FuzzyKMeansDriver. Afterwards, I am reading instances of WeightedVectorWritable from the clusteredPoints file (e.g. part-m-0). 1.) When I am clustering in a sequential manner (no map-reduce), the weights of the vectors are reasonable probabilities for the clusters. However, when I am running FuzzyKMeansDriver with sequential=false, the weight of each vector equals one for EVERY cluster. So the weights do not even sum up to 1. Am I doing something wrong here? 2.) I tried to circumvent the problem, by using the FuzzyKMeansClusterer: After clustering, I retrieved the final clusters (Class Cluster) and calculated the distance of every instance to each of the cluster centers. Then I calculated the probabilities for each cluster using the computeProbWeight method of FuzzyKMeansClusterer. Interestingly, these probabilities differ from the probabilities I get from the WeightedVectorWritable instances in the clusteredPoints file when clustering with sequential=true. Why is there a difference between the vector weights and the pdfs?? Thank you all in advance, Sebastian
