Hi, I have another question. Is the implementation of kmeans in flink-ml
same as Spark's StreamingKmeans?
Should the accuracy/results from the same dataset be comparable between the
two?

On Sun, Jun 5, 2022 at 8:14 PM Natia Chachkhiani <
natia.chachkhia...@gmail.com> wrote:

> Thanks for the reply Zhipeng and Jing.
> Running the OnlineKmeans with a fixed initial model removed the randomness!
>
>
> On Sun, Jun 5, 2022 at 6:19 PM Zhipeng Zhang <zhangzhipe...@gmail.com>
> wrote:
>
>> Hi Natia,
>>
>> As I understand, the processing order of onlineKmeans is the same the
>> input data.
>>
>> Are you running OnlineKmeans with using one data point with random
>> initial KmeansModel? Could you use a fixed initial model following [1] and
>> try out?
>>
>> [1]
>> https://github.com/apache/flink-ml/blob/239788f2b1f1f3a4e55ca112517980b598705a15/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/OnlineKMeansTest.java#L354
>>
>> Jing Ge <j...@ververica.com> 于2022年6月3日周五 17:04写道:
>>
>>> Hi,
>>>
>>> It seems like an evaluation with a small dataset. In this case, would
>>> you like to share your data sample and code? In addition, have you tried
>>> KMeans with the same dataset and got inconsistent results too?
>>>
>>> Best regards,
>>> Jing
>>>
>>> On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani <
>>> natia.chachkhia...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running OnlineKmeans from flink-ml repo on a small dataset. I've
>>>> noticed that I don't get consistent results, assignments to clusters,
>>>> across different runs. I have set both parallelism and globalBatchSize to 
>>>> 1.
>>>> I am doing simple fit and transform on each data point ingested. Is the
>>>> order of processing not guaranteed? Or am I missing something?
>>>>
>>>> Thanks,
>>>> Natia
>>>>
>>>
>>
>> --
>> best,
>> Zhipeng
>>
>>

Reply via email to