Hello everybody,

Xiangrui, thanks for the link to roadmap. I saw it is planned to implement
LDA in the MLlib 1.1. What do you think about PLSA? 

I understand that LDA is more popular now, but recent research shows that
modifications of PLSA sometimes performs better[1]. Furthermore, the most
recent paper by same authors shows that there is a clear way to extend PLSA
to LDA and beyond[2]. We can implement PLSA with this modifications in
MLlib. Is it interesting?

Actually we already have implementation of Robust PLSA over Spark. So the
task is to integrate it into MLlib.

1. A. Potapenko, K. Vorontsov. 2013. Robust PLSA performs better than LDA.
In Proceedings of ECIR'13.
2. Vorontsov, Potapenko. Tutorial on Probabilistic Topic Modeling: Additive
Regularization for Stochastic Matrix Factorization.
http://www.machinelearning.ru/wiki/images/1/1f/Voron14aist.pdf 

Best regards,
Denis.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Contribution-to-Spark-MLLib-tp7716p7844.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to