Hello everyone on the Dev team of Apache Spark.

My name is Manolis Gemeliaris and I am a student at the Hellenic
Mediterranean University (former TEI of Crete). For my thesis project I
would like to add an online kmeans algorithm (paper
<https://arxiv.org/abs/1412.5721> (Edo Liberty et al) and python
implementation <https://github.com/sviri/kmeans/tree/main/onlineKmeans/src>
(by the authors)) to Apache Spark.
As I have already read it is a really big procedure to get something like
this officially accepted and it can take a long time to achieve. So I would
like to do it as an Open Source 3rd party package instead, that would be
compatible with  Apache Spark 3.
I have already read the contribution guidelines for Spark and taken some
time studying the code on github.

I would like to ask if anyone can find the time to help me get started. Of
course I realize that your time is of importance, so just any tips that you
can share would be greatly appreciated.

Thank you in advance,
Best Regards,
Manolis Gemeliaris

Reply via email to