Hi All, I am Tao Lin, a senior Computer Science student highly interested in Data Science (Distributed Computing, Machine Learning, Visualization, etc.). I'd like to join Google Summer of Code 2016 and contribute to Spark this year. When I was viewing the past GSoC projects, I was impressed by "Enhance MLlib's Python API", completed by Manoj Kumar (Mentored by Xiangrui Meng) last year. I look forward to writing something as meaningful and impactful as what they did. The organization list of GSoC 2016 hasn't been released yet, but I'd like to join the community and make a stark on solving real problems asap. Is there anyone who's going to sign up as a mentor for GSoC this year? Maybe you could tell me about the projects you are going to mentor and give me some suggestions about what issues I could fix now to get a start. Thanks!
Here is more information about myself and my related experiences: I'm going to pursue my graduate study in the US after this summer. (I have received an offer from U Wisconsin–Madison, and I'm waiting for more admissions.) Since I am vacant this spring and summer, I could put full enthusiasm into the open-source development. I am quite familiar with Spark. I did research on data visualization in the Visual Analytics Group, State Key Lab of CAD&CG for more than two years. I administrated a cluster with more than 20 nodes for more than one year there. I helped the whole group preprocess large datasets on the cluster with Hadoop and Spark. In one of the projects, I used Spark to independently process 14 billion trajectory records (about 1.8 TB) on the cluster. The highlight of my professional experience has been working as a visiting intern in the HKUST Multimedia Technology Research Center. Through that experience, I have not only improved my programming skills, but I have also learned how to work better in a large software engineering team by applying software engineering techniques (like unit testing and code review) and how to communicate cross-culturally. As for programming languages, I'm good at Java and Python. And I believe I could handle Scala or R in a short time if it is needed in the project. (Besides, I'm experienced in C++ and JavaScript, which are unlikely to be used in Spark projects.) You could also view my CV at http://nblintao.github.io/pdf/Tao_Lin_CV.pdf Thanks for your time! Best Regards, Tao -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Interested-in-Contributing-to-Spark-as-GSoC-2016-tp16211.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org