Dani, Folding in I believe refers to setting up your Gibbs sampler (or other model) with the learning word and document topic proportions as computed by spark.
You might look at https://lists.cs.princeton.edu/pipermail/topic-models/2014-May/002763.html Where Jones suggests summing across columns of the term matrix for each of the doc terms to get the topic proportions. I have not worked with spark lda but if you can pull the theta and phi matrixes out of the spark model, you should be able to start with the approximation as inference. Have you tried vowpal wabbit or gensim? Cheers On Friday, May 22, 2015, Dani Qiu <zongmin....@gmail.com <javascript:_e(%7B%7D,'cvml','zongmin....@gmail.com');>> wrote: > thanks, Ken > but I am planning to use spark LDA in production. I cannot wait for the > future release. > At least, provide some workaround solution. > > PS : in SPARK-5567 <https://issues.apache.org/jira/browse/SPARK-5567> , > mentioned "This will require inference but should be able to use the same > code, with a few modification to keep the inferred topics fixed." Can > somebody elaborate it more ? "folding-in" in EM ? or Can I simply > summing the topic distribution of the terms in the new document ? > > On Fri, May 22, 2015 at 2:23 PM, Ken Geis <geis....@gmail.com> wrote: > >> Dani, this appears to be addressed in SPARK-5567 >> <https://issues.apache.org/jira/browse/SPARK-5567>, scheduled for Spark >> 1.5.0. >> >> >> Ken >> >> On May 21, 2015, at 11:12 PM, user-digest-h...@spark.apache.org wrote: >> >> *From: *Dani Qiu <zongmin....@gmail.com> >> *Subject: **LDA prediction on new document* >> *Date: *May 21, 2015 at 8:48:40 PM PDT >> *To: *user@spark.apache.org >> >> >> Hi, guys, I'm pretty new to LDA. I notice spark 1.3.0 mllib provide EM >> based LDA implementation. It returns both topics and topic distribution. >> >> My question is how can I use these parameters to predict on new document >> ? >> >> And I notice there is an Online LDA implementation in spark master >> branch, it only returns topics , how can I use this to do prediction on >> new document (and trained document) ? >> >> >> thanks >> >> > -- - Charles