Hello Thomas,
Thank you for the answer. I hope I will be able to clarify my schedule for the summer in about a week from now and I will decide whether I should apply to GSoC this year or not. I will let you know as soon as I can. Until then, I will shortly describe my first ideas below: 1. Spectral clustering [1] - It basically maps the data in a lower-dimensional space (relying on the eigenvectors of the similarity matrix) and performs (k-means) clustering there. This method can resolve a wide variety of problems, regardless of the form of the clusters. It could be implemented efficiently using the Commons Math linear algebra module. 2. Mean shift algorithm [2] - I didn't grasp all the details of the algorithm yet, but I find it very interesting. As far as I understand, it has been primarily used in pattern recognition and computer vision. I discovered it while searching for an algorithm that does not require the number of clusters as input parameter. I think it would be a good addition to Commons Math besides DBSCAN, from this point of view. 3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] - accounts for the intra-cluster and inter-cluster distance to assign a score in [-1, 1] to a clustering.3.2. External clustering evaluation [4] - when gold standard is available for the clustered data, it can be used to asses the performance of a clustering algorithm. Suggestions are more than welcome. If you have requests from users for specific clustering algorithms, please let me know. Best regards,Alina [1] http://www.informatik.uni-hamburg.de/ML/contents/people/luxburg/publications/Luxburg07_tutorial.pdf[2] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1055330&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1055330[3] http://www.sciencedirect.com/science/article/pii/0377042787901257[4] http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html From: Thomas Neidhart <thomas.neidh...@gmail.com> To: Commons Developers List <dev@commons.apache.org> Sent: Sunday, February 1, 2015 8:33 PM Subject: Re: [Math] Contributions to the clustering module (maybe GSoC) On 02/01/2015 02:06 PM, Alina Ciobanu wrote: > Hello everyone, > My name is Alina Ciobanu. I'm a first-year Ph.D. student in computer science > (NLP) at the Faculty of Mathematics and Computer Science, University of > Bucharest, Romania. I am interested in contributing to the Apache Commons > Math library. My idea is to work on the clustering module, to implement > spectral clustering, maybe also the mean shift algorithm, and some clustering > validation methods. Would you please tell me if you think that such a > contribution would be useful to the Commons Math users? If so, I will provide > more details about what I have in mind. Any suggestions are welcome. > I am also thinking about applying to Google Summer of Code this year. I > haven't decided yet because I am not sure, at this moment, if my schedule for > this summer would allow it. Thus, this question is only in perspective: would > anyone from the Commons Math community be interested in mentoring a GSoC > project (on the clustering module, as described above, or on something > related)? > Best regards,Alina Ciobanu Hi Alina, good to hear about your interest on commons-math. New contributions are very welcome, and we have indeed several feature requests to add new clustering algorithms. I am certainly interested in mentoring you for GSOC, but there are maybe also others that can help with that here. Just let us know what you want to do early on so that we can prepare ourselves. Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org