Hi, I have another question. Is the implementation of kmeans in flink-ml same as Spark's StreamingKmeans? Should the accuracy/results from the same dataset be comparable between the two?
On Sun, Jun 5, 2022 at 8:14 PM Natia Chachkhiani < natia.chachkhia...@gmail.com> wrote: > Thanks for the reply Zhipeng and Jing. > Running the OnlineKmeans with a fixed initial model removed the randomness! > > > On Sun, Jun 5, 2022 at 6:19 PM Zhipeng Zhang <zhangzhipe...@gmail.com> > wrote: > >> Hi Natia, >> >> As I understand, the processing order of onlineKmeans is the same the >> input data. >> >> Are you running OnlineKmeans with using one data point with random >> initial KmeansModel? Could you use a fixed initial model following [1] and >> try out? >> >> [1] >> https://github.com/apache/flink-ml/blob/239788f2b1f1f3a4e55ca112517980b598705a15/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/OnlineKMeansTest.java#L354 >> >> Jing Ge <j...@ververica.com> 于2022年6月3日周五 17:04写道: >> >>> Hi, >>> >>> It seems like an evaluation with a small dataset. In this case, would >>> you like to share your data sample and code? In addition, have you tried >>> KMeans with the same dataset and got inconsistent results too? >>> >>> Best regards, >>> Jing >>> >>> On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani < >>> natia.chachkhia...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am running OnlineKmeans from flink-ml repo on a small dataset. I've >>>> noticed that I don't get consistent results, assignments to clusters, >>>> across different runs. I have set both parallelism and globalBatchSize to >>>> 1. >>>> I am doing simple fit and transform on each data point ingested. Is the >>>> order of processing not guaranteed? Or am I missing something? >>>> >>>> Thanks, >>>> Natia >>>> >>> >> >> -- >> best, >> Zhipeng >> >>