Yes that is exactly it. The values are not comparable since normalization is also shrinking all distances. Squared error is not an absolute metric.
I haven't thought about this much but maybe you are looking for something like the silhouette coefficient? On Oct 30, 2014 5:35 PM, "mgCl2" <florent.jouante...@gmail.com> wrote: > Hello everyone, > > I'm trying to use MLlib's K-mean algorithm. > > I tried it on raw data, Here is a example of a line contained in my input > data set: > 82.9817 3281.4495 > > with those parameters: > *numClusters*=4 > *numIterations*=20 > > results: > *WSSSE = 6.375371241589461E9* > > Then I normalized my data: > 0.02219046937793337492 0.97780953062206662508 > With the same parameters, result is now: > *WSSSE= 0.04229916511906393* > > Is it normal that normalization improve my results? > Why isn't the WSSSE normalized? Because it seems that having smaller values > end to a smaller WSSSE > I'm sure I missed something here! > > Florent > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/k-mean-result-interpretation-tp17748.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >