thanks, Ken but I am planning to use spark LDA in production. I cannot wait for the future release. At least, provide some workaround solution.
PS : in SPARK-5567 <https://issues.apache.org/jira/browse/SPARK-5567> , mentioned "This will require inference but should be able to use the same code, with a few modification to keep the inferred topics fixed." Can somebody elaborate it more ? "folding-in" in EM ? or Can I simply summing the topic distribution of the terms in the new document ? On Fri, May 22, 2015 at 2:23 PM, Ken Geis <[email protected]> wrote: > Dani, this appears to be addressed in SPARK-5567 > <https://issues.apache.org/jira/browse/SPARK-5567>, scheduled for Spark > 1.5.0. > > > Ken > > On May 21, 2015, at 11:12 PM, [email protected] wrote: > > *From: *Dani Qiu <[email protected]> > *Subject: **LDA prediction on new document* > *Date: *May 21, 2015 at 8:48:40 PM PDT > *To: *[email protected] > > > Hi, guys, I'm pretty new to LDA. I notice spark 1.3.0 mllib provide EM > based LDA implementation. It returns both topics and topic distribution. > > My question is how can I use these parameters to predict on new document ? > > And I notice there is an Online LDA implementation in spark master branch, > it only returns topics , how can I use this to do prediction on new > document (and trained document) ? > > > thanks > >
