LDA prediction on new document

Charles Earl Fri, 22 May 2015 06:30:05 -0700

Dani,
Folding in I believe refers to setting up your Gibbs sampler (or other
model) with the learning word and document topic proportions as computed by
spark.


You might look at

https://lists.cs.princeton.edu/pipermail/topic-models/2014-May/002763.html

Where Jones suggests summing across columns of the term matrix for each of
the doc terms to get the topic proportions.


 I have not worked with spark lda but if you can pull the theta and phi
matrixes out of the spark model, you should be able to start with the
approximation as inference.

Have you tried vowpal wabbit or gensim?

Cheers

On Friday, May 22, 2015, Dani Qiu <zongmin....@gmail.com
<javascript:_e(%7B%7D,'cvml','zongmin....@gmail.com');>> wrote:

> thanks, Ken
> but I am planning to use spark LDA in production. I cannot wait for the
> future release.
>  At least,  provide some workaround solution.
>
> PS : in  SPARK-5567 <https://issues.apache.org/jira/browse/SPARK-5567> ,
> mentioned "This will require inference but should be able to use the same
> code, with a few modification to keep the inferred topics fixed." Can
> somebody elaborate it more ?  "folding-in" in EM ?  or  Can I  simply
> summing the topic distribution of the terms in the new document ?
>
> On Fri, May 22, 2015 at 2:23 PM, Ken Geis <geis....@gmail.com> wrote:
>
>> Dani, this appears to be addressed in SPARK-5567
>> <https://issues.apache.org/jira/browse/SPARK-5567>, scheduled for Spark
>> 1.5.0.
>>
>>
>> Ken
>>
>> On May 21, 2015, at 11:12 PM, user-digest-h...@spark.apache.org wrote:
>>
>> *From: *Dani Qiu <zongmin....@gmail.com>
>> *Subject: **LDA prediction on new document*
>> *Date: *May 21, 2015 at 8:48:40 PM PDT
>> *To: *user@spark.apache.org
>>
>>
>> Hi, guys, I'm pretty new to LDA. I notice spark 1.3.0  mllib provide EM
>> based LDA implementation. It returns both topics and topic distribution.
>>
>> My question is how can I use these parameters to predict on new document
>> ?
>>
>> And I notice there is an Online LDA implementation in spark master
>> branch, it only returns topics , how can I use this to  do prediction on
>> new document (and trained document) ?
>>
>>
>> thanks
>>
>>
>

-- 
- Charles

LDA prediction on new document

Reply via email to