hi keegan,
Thanks a lot. Now I know the column represents all the words without
repetition in all documents. I don't know what determine the order of the
words, is there any difference when the column words with the different
order? Thanks.
This matrix is the format of a Document Term Matrix. Each row represents all
the words in a single document, each column represents just one of the
possible words, and the elements of the matrix are the corresponding word
counts.
Simple example here http://en.wikipedia.org/wiki/Document-term_matr
Hi Cui,
Try to read the scala version of LDAExample,
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
The matrix you're referring to is the corpus after vectorization.
One example, given a dict, [apple, orange, banana]
3 doc