Hi Julian, Thanks for reporting this. This is a valid issue and I created https://issues.apache.org/jira/browse/SPARK-19957 to track it.
Right now the seed is set to this.getClass.getName.hashCode.toLong by default, which indeed keeps the same among multiple fits. Feel free to leave your comments or send a PR for the fix. For your problem, you may add .setSeed(new Random().nextLong()) before fit() as a workaround. Thanks, Yuhao 2017-03-14 5:46 GMT-07:00 Julian Keppel <juliankeppel1...@gmail.com>: > I'm sorry, I missed some important informations. I use Spark version 2.0.2 > in Scala 2.11.8. > > 2017-03-14 13:44 GMT+01:00 Julian Keppel <juliankeppel1...@gmail.com>: > >> Hi everybody, >> >> I make some experiments with the Spark kmeans implementation of the new >> DataFrame-API. I compare clustering results of different runs with >> different parameters. I recognized that for random initialization mode, the >> seed value is the same every time. How is it calculated? In my >> understanding the seed should be random if it is not provided by the user. >> >> Thank you for you help. >> >> Julian >> > >