Hi, I am trying to write and debug Spark applications in scala-ide and maven, and in my code I target at a Spark instance at spark://xxx
object App { def main(args : Array[String]) { println( "Hello World!" ) val sparkConf = new SparkConf().setMaster("spark://xxx:7077").setAppName("WordCount") val spark = new SparkContext(sparkConf) val file = spark.textFile("hdfs://xxx:9000/wcinput/pg1184.txt") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://flex05.watson.ibm.com:9000/wcoutput") } } I added spark-core and hadoop-client in maven dependency so the code compiles fine. When I click run in Eclipse I got this error: 14/06/06 20:52:18 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: samples.App$$anonfun$2 I googled this error and it seems that I need to package my code into a jar file and push it to spark nodes. But since I am debugging the code, it would be handy if I can quickly see results without packaging and uploading jars. What is the best practice of writing a spark application (like wordcount) and debug quickly on a remote spark instance? Thanks! Wei --------------------------------- Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan