Hi, I'm working on providing a solution for MATH-1330 and facing several design related issues which I'd like to share, since I'd like my solution to fit with the project road map and integrity. So, I'm looking on Clusterable interface and looks like automatically impose the way internal representation of data should look like, since getPoint() method signature indirectly assumes that it has to be an array of doubles. And this might not be a true for certain cases. IMO replacing of getPoint() with getDistanceTo(Clusterable a) could be a better solution, since it doesn't assumes anything about internal representation. From other side that means what Clusterable instances need to be aware which DistanceMeasure implementation used for clustering.
Therefore I'm not completely sure how to move on with it. Moreover suppose I'll replace getPoint to return RealVector, then next issue will be to decide how should I define/create cluster centers. Whenever do I need to use sparse or dense implementation? One of the possible solutions I'm thinking of is to decouple the process of seeding the initial cluster centers and the Lloyd's iterations. That way I can actually seed initial centers, provide them as a parameter into clustering algorithm, which will move centers during the iteration instead of creating each time new centroid instance. While it will work for center based clustering algorithm that will not be the case for others, hence not sure how I can fit this solution into the current design. Any thoughts or suggestions? BR, Artem.