Re: Contribution to Spark MLLib

Debasish Das Wed, 13 Aug 2014 08:41:42 -0700

Dennis,

If it is PLSA with least square loss then the QuadraticMinimizer that we
open sourced should be able to solve it for modest topics (till 1000 I
believe)...if we integrate a cg solver for equality (Nocedal's KNITRO paper
is the reference) the topic size can be increased much larger than ALS
normal ranks of 50-400....

Please look at the following JIRA and see if the Formulation 4 fits your
use-case....we will be using it internally for topic modeling...

https://issues.apache.org/jira/browse/SPARK-2426

If you need convex costs like kl divergence which I believe what most PLSA
uses, then it is not supported right now but internally we decided to start
with least square loss first and then move to KL divergence if we need
further cluster purity.

Thanks.
Deb

On Jun 18, 2014 11:39 PM, "Xiangrui Meng" <men...@gmail.com> wrote:

> Denis, I think it is fine to have PLSA in MLlib. But I'm not familiar
> with the modification you mentioned since the paper is new. We may
> need to spend more time to learn the trade-offs. Feel free to create a
> JIRA for PLSA and we can move our discussion there. It would be great
> if you can share your current implementation. So it is easy for
> developers to join the discussion.
>
> Jayati, it is certainly NOT mandatory. But if you want to contribute
> something new, please create a JIRA first.
>
> Best,
> Xiangrui
>

Re: Contribution to Spark MLLib

Reply via email to