Hi,

  I am trying to write and debug Spark applications in scala-ide and 
maven, and in my code I target at a Spark instance at spark://xxx

object App {
 
 
  def main(args : Array[String]) {
    println( "Hello World!" )
    val sparkConf = new 
SparkConf().setMaster("spark://xxx:7077").setAppName("WordCount")
 
    val spark = new SparkContext(sparkConf)
    val file = spark.textFile("hdfs://xxx:9000/wcinput/pg1184.txt")
    val counts = file.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
    counts.saveAsTextFile("hdfs://flex05.watson.ibm.com:9000/wcoutput") 
  }

}

I added spark-core and hadoop-client in maven dependency so the code 
compiles fine.

When I click run in Eclipse I got this error:

14/06/06 20:52:18 WARN scheduler.TaskSetManager: Loss was due to 
java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: samples.App$$anonfun$2

I googled this error and it seems that I need to package my code into a 
jar file and push it to spark nodes. But since I am debugging the code, it 
would be handy if I can quickly see results without packaging and 
uploading jars.

What is the best practice of writing a spark application (like wordcount) 
and debug quickly on a remote spark instance?

Thanks!
Wei


---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan

Reply via email to