RE: The explanation of input text format using LDA in Spark

Yang, Yuhao Fri, 08 May 2015 01:32:35 -0700

Hi Cui,

Try to read the scala version of LDAExample, 
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala


The matrix you're referring to is the corpus after vectorization. 

One example, given a dict, [apple, orange, banana]
3 documents:
        Apple orange
        Orange banana
        Apple banana
Can be represented by dense vectors:
        1, 1, 0
        0, 1, 1
        1, 0, 1

Cheers,
Yuhao


-----Original Message-----
From: Cui xp [mailto:[email protected]] 
Sent: Wednesday, May 6, 2015 4:28 PM
To: [email protected]
Subject: The explanation of input text format using LDA in Spark

Hi all,
   After I read the example code using LDA in Spark, I found the input text in 
the code is a matrix. the format of the text is as follows:
1 2 6 0 2 3 1 1 0 0 3
1 3 0 1 3 0 0 2 0 0 1
1 4 1 0 0 4 9 0 1 2 0
2 1 0 3 0 0 5 0 2 3 9
3 1 1 9 3 0 2 0 0 1 3
4 2 0 3 4 5 1 1 1 4 0
2 1 0 3 0 0 5 0 2 2 9
1 1 1 9 2 1 2 0 0 1 3
4 4 0 3 4 2 1 3 0 0 0
2 8 2 0 3 0 2 0 2 7 2
1 1 1 9 0 2 2 0 0 3 3
4 1 0 0 4 5 1 3 0 1 0
But I don't know the explanation of each line or each column. And if I have 
several text documents, how do I process them to use LDA in Spark? Thanks.
                                                                                
                                            
Cui xp



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-explanation-of-input-text-format-using-LDA-in-Spark-tp22781.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: The explanation of input text format using LDA in Spark

Reply via email to