Thanks for the reply Zhipeng and Jing.
Running the OnlineKmeans with a fixed initial model removed the randomness!


On Sun, Jun 5, 2022 at 6:19 PM Zhipeng Zhang <zhangzhipe...@gmail.com>
wrote:

> Hi Natia,
>
> As I understand, the processing order of onlineKmeans is the same the
> input data.
>
> Are you running OnlineKmeans with using one data point with random initial
> KmeansModel? Could you use a fixed initial model following [1] and try out?
>
> [1]
> https://github.com/apache/flink-ml/blob/239788f2b1f1f3a4e55ca112517980b598705a15/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/OnlineKMeansTest.java#L354
>
> Jing Ge <j...@ververica.com> 于2022年6月3日周五 17:04写道:
>
>> Hi,
>>
>> It seems like an evaluation with a small dataset. In this case, would you
>> like to share your data sample and code? In addition, have you tried KMeans
>> with the same dataset and got inconsistent results too?
>>
>> Best regards,
>> Jing
>>
>> On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani <
>> natia.chachkhia...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running OnlineKmeans from flink-ml repo on a small dataset. I've
>>> noticed that I don't get consistent results, assignments to clusters,
>>> across different runs. I have set both parallelism and globalBatchSize to 1.
>>> I am doing simple fit and transform on each data point ingested. Is the
>>> order of processing not guaranteed? Or am I missing something?
>>>
>>> Thanks,
>>> Natia
>>>
>>
>
> --
> best,
> Zhipeng
>
>

Reply via email to