Re: [MLlib] kmeans random initialization, same seed every time

Yuhao Yang Tue, 14 Mar 2017 23:42:40 -0700

Hi Julian,

Thanks for reporting this. This is a valid issue and I created
https://issues.apache.org/jira/browse/SPARK-19957 to track it.


Right now the seed is set to this.getClass.getName.hashCode.toLong by
default, which indeed keeps the same among multiple fits. Feel free to
leave your comments or send a PR for the fix.

For your problem, you may add .setSeed(new Random().nextLong()) before
fit() as a workaround.

Thanks,
Yuhao

2017-03-14 5:46 GMT-07:00 Julian Keppel <juliankeppel1...@gmail.com>:

> I'm sorry, I missed some important informations. I use Spark version 2.0.2
> in Scala 2.11.8.
>
> 2017-03-14 13:44 GMT+01:00 Julian Keppel <juliankeppel1...@gmail.com>:
>
>> Hi everybody,
>>
>> I make some experiments with the Spark kmeans implementation of the new
>> DataFrame-API. I compare clustering results of different runs with
>> different parameters. I recognized that for random initialization mode, the
>> seed value is the same every time. How is it calculated? In my
>> understanding the seed should be random if it is not provided by the user.
>>
>> Thank you for you help.
>>
>> Julian
>>
>
>

Re: [MLlib] kmeans random initialization, same seed every time

Reply via email to