Hello everybody, Xiangrui, thanks for the link to roadmap. I saw it is planned to implement LDA in the MLlib 1.1. What do you think about PLSA?
I understand that LDA is more popular now, but recent research shows that modifications of PLSA sometimes performs better[1]. Furthermore, the most recent paper by same authors shows that there is a clear way to extend PLSA to LDA and beyond[2]. We can implement PLSA with this modifications in MLlib. Is it interesting? Actually we already have implementation of Robust PLSA over Spark. So the task is to integrate it into MLlib. 1. A. Potapenko, K. Vorontsov. 2013. Robust PLSA performs better than LDA. In Proceedings of ECIR'13. 2. Vorontsov, Potapenko. Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization. http://www.machinelearning.ru/wiki/images/1/1f/Voron14aist.pdf Best regards, Denis. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Contribution-to-Spark-MLLib-tp7716p7844.html Sent from the Apache Spark User List mailing list archive at Nabble.com.