Hello everyone on the Dev team of Apache Spark. My name is Manolis Gemeliaris and I am a student at the Hellenic Mediterranean University (former TEI of Crete). For my thesis project I would like to add an online kmeans algorithm (paper <https://arxiv.org/abs/1412.5721> (Edo Liberty et al) and python implementation <https://github.com/sviri/kmeans/tree/main/onlineKmeans/src> (by the authors)) to Apache Spark. As I have already read it is a really big procedure to get something like this officially accepted and it can take a long time to achieve. So I would like to do it as an Open Source 3rd party package instead, that would be compatible with Apache Spark 3. I have already read the contribution guidelines for Spark and taken some time studying the code on github.
I would like to ask if anyone can find the time to help me get started. Of course I realize that your time is of importance, so just any tips that you can share would be greatly appreciated. Thank you in advance, Best Regards, Manolis Gemeliaris