Thanks for the reply Zhipeng and Jing. Running the OnlineKmeans with a fixed initial model removed the randomness!
On Sun, Jun 5, 2022 at 6:19 PM Zhipeng Zhang <zhangzhipe...@gmail.com> wrote: > Hi Natia, > > As I understand, the processing order of onlineKmeans is the same the > input data. > > Are you running OnlineKmeans with using one data point with random initial > KmeansModel? Could you use a fixed initial model following [1] and try out? > > [1] > https://github.com/apache/flink-ml/blob/239788f2b1f1f3a4e55ca112517980b598705a15/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/OnlineKMeansTest.java#L354 > > Jing Ge <j...@ververica.com> 于2022年6月3日周五 17:04写道: > >> Hi, >> >> It seems like an evaluation with a small dataset. In this case, would you >> like to share your data sample and code? In addition, have you tried KMeans >> with the same dataset and got inconsistent results too? >> >> Best regards, >> Jing >> >> On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani < >> natia.chachkhia...@gmail.com> wrote: >> >>> Hi, >>> >>> I am running OnlineKmeans from flink-ml repo on a small dataset. I've >>> noticed that I don't get consistent results, assignments to clusters, >>> across different runs. I have set both parallelism and globalBatchSize to 1. >>> I am doing simple fit and transform on each data point ingested. Is the >>> order of processing not guaranteed? Or am I missing something? >>> >>> Thanks, >>> Natia >>> >> > > -- > best, > Zhipeng > >