Hi,
I'm new to Mllib and spark. I'm trying to use tf-idf and use those values
for term ranking.
I'm getting tf values in vector format, but how can get the values of
vector?
val sc = new SparkContext(conf)
val documents: RDD[Seq[String]] =
sc.textFile("/home/andrejs/Datasets/dbpedia/test.txt").map(_.split("
").toSeq)
documents.foreach(println(_))
val hashingTF = new HashingTF()
val tf: RDD[Vector] = hashingTF.transform(documents)
tf.foreach(println(_))
My output is :
WrappedArray(a, a, b, c)
WrappedArray(e, a, c, d)
(1048576,[97,99,100,101],[1.0,1.0,1.0,1.0])
(1048576,[97,98,99],[2.0,1.0,1.0])
How can I get [97,99,100,101] out, and [1.0,1.0,1.0,1.0] ?
And how can I map that 100 = 1.0 ?
Some help is greatly appreciated,
Andrejs