Hello Thomas,

Thank you for the answer. I hope I will be able to clarify my schedule for the 
summer in about a week from now and I will decide whether I should apply to 
GSoC this year or not. I will let you know as soon as I can. Until then, I will 
shortly describe my first ideas below:


1. Spectral clustering [1] - It basically maps the data in a lower-dimensional 
space (relying on the eigenvectors of the similarity matrix) and performs 
(k-means) clustering there. This method can resolve a wide variety of problems, 
regardless of the form of the clusters. It could be implemented efficiently 
using the Commons Math linear algebra module.


2. Mean shift algorithm [2] - I didn't grasp all the details of the algorithm 
yet, but I find it very interesting. As far as I understand, it has been 
primarily used in pattern recognition and computer vision. I discovered it 
while searching for an algorithm that does not require the number of clusters 
as input parameter. I think it would be a good addition to Commons Math besides 
DBSCAN, from this point of view.


3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] - accounts 
for the intra-cluster and inter-cluster distance to assign a score in [-1, 1] 
to a clustering.3.2. External clustering evaluation [4] - when gold standard is 
available for the clustered data, it can be used to asses the performance of a 
clustering algorithm.


Suggestions are more than welcome. If you have requests from users for specific 
clustering algorithms, please let me know.


Best regards,Alina


[1] 
http://www.informatik.uni-hamburg.de/ML/contents/people/luxburg/publications/Luxburg07_tutorial.pdf[2]
 
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1055330&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1055330[3]
 http://www.sciencedirect.com/science/article/pii/0377042787901257[4] 
http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
  
      From: Thomas Neidhart <thomas.neidh...@gmail.com>
 To: Commons Developers List <dev@commons.apache.org> 
 Sent: Sunday, February 1, 2015 8:33 PM
 Subject: Re: [Math] Contributions to the clustering module (maybe GSoC)
   
On 02/01/2015 02:06 PM, Alina Ciobanu wrote:


> Hello everyone,
> My name is Alina Ciobanu. I'm a first-year Ph.D. student in computer science 
> (NLP) at the Faculty of Mathematics and Computer Science, University of 
> Bucharest, Romania. I am interested in contributing to the Apache Commons 
> Math library. My idea is to work on the clustering module, to implement 
> spectral clustering, maybe also the mean shift algorithm, and some clustering 
> validation methods. Would you please tell me if you think that such a 
> contribution would be useful to the Commons Math users? If so, I will provide 
> more details about what I have in mind. Any suggestions are welcome.
> I am also thinking about applying to Google Summer of Code this year. I 
> haven't decided yet because I am not sure, at this moment, if my schedule for 
> this summer would allow it. Thus, this question is only in perspective: would 
> anyone from the Commons Math community be interested in mentoring a GSoC 
> project (on the clustering module, as described above, or on something 
> related)?
> Best regards,Alina Ciobanu

Hi Alina,

good to hear about your interest on commons-math. New contributions are
very welcome, and we have indeed several feature requests to add new
clustering algorithms.

I am certainly interested in mentoring you for GSOC, but there are maybe
also others that can help with that here.

Just let us know what you want to do early on so that we can prepare
ourselves.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



  

Reply via email to