Yes that is exactly it. The values are not comparable since normalization
is also shrinking all distances. Squared error is not an absolute metric.

I haven't thought about this much but maybe you are looking for something
like the silhouette coefficient?
On Oct 30, 2014 5:35 PM, "mgCl2" <florent.jouante...@gmail.com> wrote:

> Hello everyone,
>
> I'm trying to use MLlib's K-mean algorithm.
>
> I tried it on raw data, Here is a example of a line contained in my input
> data set:
> 82.9817 3281.4495
>
> with those parameters:
> *numClusters*=4
> *numIterations*=20
>
> results:
> *WSSSE = 6.375371241589461E9*
>
> Then I normalized my data:
> 0.02219046937793337492 0.97780953062206662508
> With the same parameters, result is now:
>  *WSSSE= 0.04229916511906393*
>
> Is it normal that normalization improve my results?
> Why isn't the WSSSE normalized? Because it seems that having smaller values
> end to a smaller WSSSE
> I'm sure I missed something here!
>
> Florent
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/k-mean-result-interpretation-tp17748.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to