Re: The explanation of input text format using LDA in Spark

2015-05-14 Thread Cui xp
hi keegan, Thanks a lot. Now I know the column represents all the words without repetition in all documents. I don't know what determine the order of the words, is there any difference when the column words with the different order? Thanks.

Re: The explanation of input text format using LDA in Spark

2015-05-12 Thread keegan
This matrix is the format of a Document Term Matrix. Each row represents all the words in a single document, each column represents just one of the possible words, and the elements of the matrix are the corresponding word counts. Simple example here http://en.wikipedia.org/wiki/Document-term_matr

RE: The explanation of input text format using LDA in Spark

2015-05-08 Thread Yang, Yuhao
Hi Cui, Try to read the scala version of LDAExample, https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala The matrix you're referring to is the corpus after vectorization. One example, given a dict, [apple, orange, banana] 3 doc