Re: LDA topic modeling and Spark

2015-12-03 Thread Robin East
What exactly is this probability distribution? For each word in your vocabulary it is the probability that a randomly drawn word from a topic is that word. Another way to visualise it is a 2-column vector where the 1st column is a word in your vocabulary and the 2nd column is the probability of

LDA topic modeling and Spark

2015-12-02 Thread Nguyen, Tiffany T
Hello, I have been trying to understand the LDA topic modeling example provided here: https://spark.apache.org/docs/latest/mllib-clustering.html#latent-dirichlet-allocation-lda. In the example, they load word count vectors from a text file that contains these word counts and then they output th