Re: New API for TFIDF generation in Spark 1.1.0

nilesh Thu, 09 Oct 2014 11:09:12 -0700

hi Xiangrui,

     I am trying to implement the tfidf as per the instruction you sent in
your response to Jatin.
I am getting an error in idf step. Here are my steps that run till the last
line where the compile
fails.


val labeledDocs = sc.textFile("title_subcategory")

val stopwords = scala.io.Source.fromFile("stopwords.txt").getLines().toList

val labeledTerms =
labeledDocs.map(_.split('\t')).map(x=>(x(2).toDouble,x(1).split('
').map(_.toLowerCase).filter(!stopwords.contains(_)).toSeq))

val tf = new HashingTF()

val freqs = labeledTerms.map(x=>(x._1,tf.transform(x._2)))

val idf =  new IDF()

val idfModel = idf.fit(freqs.values)

val vectors = freqs.map(x => LabeledPoint(x._1, idfModel.transform(x._2)))

This is where it fails with the following error:

NBContentSubcategory.scala:39: error: overloaded method value transform with
alternatives:
  (dataset:
org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.linalg.Vector])org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.linalg.Vector]
<and>
  (dataset:
org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector])org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
 cannot be applied to (org.apache.spark.mllib.linalg.Vector)
        val transformedValues = idfModel.transform(values)

It seems to be getting confused with multiple (java and scala) transform
methods. 

Any insights?

Thanks,
Nilesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/New-API-for-TFIDF-generation-in-Spark-1-1-0-tp14543p16057.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: New API for TFIDF generation in Spark 1.1.0

Reply via email to