It would be great to get more contributions! If you're new to contributing, it will be good to start with some small contributions and check out: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
But if those build up to a larger contribution, the top ones I'd pick out are: SPARK-6442 (local linear algebra): This could be done incrementally, and should be coordinated on that JIRA since I believe others may be working on it. SPARK-3703 (Ensemble algorithms): It would be great to get a generic boosting algorithm under the Pipelines API (probably AdaBoost). SPARK-5992 (LSH): I believe there is active work on this, so it would be important to coordinate via JIRA on that. The other JIRAs which Feynman & I did not comment on either have some active work or are likely lower priority. However, if you're interested in one of those algorithms, you could publish it as a Spark package: http://spark-packages.org/ Good luck! Joseph On Thu, Jul 9, 2015 at 1:20 PM, Feynman Liang <fli...@databricks.com> wrote: > Exciting, thanks for the contribution! I'm currently aware of: > > - SPARK-8499 is currently in progress (in a duplicate issue); I > updated the JIRA to reflect that. > - SPARK-5992 has a spark package > <http://spark-packages.org/package/mrsqueeze/spark-hash> linked but > I'm unclear on whether there is any progress there. > > Feynman > > On Thu, Jul 9, 2015 at 1:04 PM, emrehan <emrehan.tu...@gmail.com> wrote: > >> Hi all, >> >> We could contribute to a feature to Spark MLlib by May 2016 and make it >> count as our undergraduate senior project. The following list of issues >> seem >> interesting to us: >> >> * https://issues.apache.org/jira/browse/SPARK-2273 >> <https://issues.apache.org/jira/browse/SPARK-2273> – Online learning >> algorithms: Passive Aggressive >> * https://issues.apache.org/jira/browse/SPARK-2335 >> <https://issues.apache.org/jira/browse/SPARK-2335> – K-Nearest >> Neighbor >> classification and regression for MLLib >> * https://issues.apache.org/jira/browse/SPARK-2401 >> <https://issues.apache.org/jira/browse/SPARK-2401> – AdaBoost.MH, a >> multi-class multi-label classifier >> * https://issues.apache.org/jira/browse/SPARK-4251 >> <https://issues.apache.org/jira/browse/SPARK-4251> – Add Restricted >> Boltzmann machine(RBM) algorithm to MLlib >> * https://issues.apache.org/jira/browse/SPARK-4752 >> <https://issues.apache.org/jira/browse/SPARK-4752> – Classifier >> based on >> artificial neural network >> * https://issues.apache.org/jira/browse/SPARK-5575 >> <https://issues.apache.org/jira/browse/SPARK-5575> – Artificial >> neural >> networks for MLlib deep learning >> * https://issues.apache.org/jira/browse/SPARK-5992 >> <https://issues.apache.org/jira/browse/SPARK-5992> – Locality >> Sensitive >> Hashing (LSH) for MLlib >> * https://issues.apache.org/jira/browse/SPARK-6425 >> <https://issues.apache.org/jira/browse/SPARK-6425> – Add parallel >> Q-learning algorithm to MLLib >> * https://issues.apache.org/jira/browse/SPARK-6442 >> <https://issues.apache.org/jira/browse/SPARK-6442> – Local Linear >> Algebra Package >> * https://issues.apache.org/jira/browse/SPARK-8499 >> <https://issues.apache.org/jira/browse/SPARK-8499> – NaiveBayes >> implementation for MLPipeline >> >> All of these tickets are marked unassigned but have some work done on >> them. >> Are any of these issues are unsuitable for us as a senior project? >> >> Kind regards, >> Can Giracoglu, Emrehan Tuzun, Remzi Can Aksoy, Saygin Dogu >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/Are-These-Issues-Suitable-for-our-Senior-Project-tp13119.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >