On 2/7/15 1:53 PM, Alina Ciobanu wrote: > Hello, > > I finally figured out my schedule for this summer and the conclusion is that > I would be able to dedicate about 20 hours per week for the GSoC project. As > far as I understand, this is about half of what is expected from a GSoC > student, so unfortunately I think I should not apply this year. I want to > contribute to the Commons Math library nonetheless.
Patches / review / ideas are always welcome! Phil > > Best regards, > Alina > From: Thomas Neidhart <thomas.neidh...@gmail.com> > To: Commons Developers List <dev@commons.apache.org> > Sent: Tuesday, February 3, 2015 1:17 AM > Subject: Re: [Math] Contributions to the clustering module (maybe GSoC) > > On 02/02/2015 10:36 PM, Alina Ciobanu wrote: >> Hello Thomas, >> >> >> Thank you for the answer. I hope I will be able to clarify my schedule for >> the summer in about a week from now and I will decide whether I should apply >> to GSoC this year or not. I will let you know as soon as I can. Until then, >> I will shortly describe my first ideas below: >> >> >> 1. Spectral clustering [1] - It basically maps the data in a >> lower-dimensional space (relying on the eigenvectors of the similarity >> matrix) and performs (k-means) clustering there. This method can resolve a >> wide variety of problems, regardless of the form of the clusters. It could >> be implemented efficiently using the Commons Math linear algebra module. >> >> >> 2. Mean shift algorithm [2] - I didn't grasp all the details of the >> algorithm yet, but I find it very interesting. As far as I understand, it >> has been primarily used in pattern recognition and computer vision. I >> discovered it while searching for an algorithm that does not require the >> number of clusters as input parameter. I think it would be a good addition >> to Commons Math besides DBSCAN, from this point of view. >> >> >> 3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] - >> accounts for the intra-cluster and inter-cluster distance to assign a score >> in [-1, 1] to a clustering.3.2. External clustering evaluation [4] - when >> gold standard is available for the clustered data, it can be used to asses >> the performance of a clustering algorithm. >> >> >> Suggestions are more than welcome. If you have requests from users for >> specific clustering algorithms, please let me know. > You proposals sound good, as a pointer to already existing feature > requests you can take a look at: > > * Optics algorithm - https://issues.apache.org/jira/browse/MATH-1190 > * HAC algorithm - https://issues.apache.org/jira/browse/MATH-959 > > Cluster evaluation would also be very interesting, I already wanted to > do something in this direction but could not find the time. > > btw. by coincidence, we received a reminder about this years GSOC just > today, the deadline is 13-02-2015 to submit a project proposal with > project ideas. > > > > Thomas > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org