I just wanted to let you know, Marcelo, and others who may run into this error in the future... I figured it out!
When I first started to work on my scripts, I was using "sbt/sbt package" followed by an "sbt/sbt run". But, when I saw "sbt/sbt run" show that it was compiling the script, I gave up on the "sbt/sbt package", not realizing it actually put the jar together that was distributed to all the nodes. I found out that the jar that I was using with my script was actually very old, and did not have the .class file required by the program, hence the ClassNotFoundException error. So, in the future, always "sbt/sbt package" before doing an "sbt/sbt run". Thank you for all your help, Marcelo. Ian On Mon, Apr 14, 2014 at 2:59 PM, Ian Bonnycastle <ibo...@gmail.com> wrote: > Hi Marcelo, > > Changing it to null didn't make any difference at all. > /usr/local/pkg/spark is also on all the nodes... it has to be in order to > get all the nodes up and running in the cluster. Also, I'm confused by what > you mean with "That's most probably the class that implements the closure > you're passing as an argument to the reduceByKey() method". The only thing > I'm passing to it is "_ + _".. and as you mentioned, its pretty much the > same as the map() method. > > If I run the following code, it runs 100% properly on the cluster: > > val numAs = logData.filter(line => line.contains("a")).count() > > So, this is a closure to the filter() method, and it doesn't have any > problems at all. Also, if I run the reduceByKey in local mode, it runs > perfectly. So, as you mentioned, it almost sounds like the code, or the > closure, is not getting to all the nodes properly. But why reduceByKey is > the only method affected is beyond me. > > Ian > > > On Mon, Apr 14, 2014 at 2:45 PM, Marcelo Vanzin <van...@cloudera.com>wrote: > >> Hi Ian, >> >> On Mon, Apr 14, 2014 at 11:30 AM, Ian Bonnycastle <ibo...@gmail.com> >> wrote: >> > val sc = new SparkContext("spark://<masternodeip>:7077", >> > "Simple App", "/usr/local/pkg/spark", >> > List("target/scala-2.10/simple-project_2.10-1.0.jar")) >> >> Hmmm... does /usr/local/pkg/spark exist on all the worker nodes? (I >> haven't particularly tried using the sparkHome argument myself, nor >> have I traced through the code to see exactly what it does, but...). >> I'd try to set the "sparkHome" argument to null and seeing if that >> helps. (It has been working for me without it.) Since you're already >> listing you app's jar file there, you don't need to explicitly call >> addJar(). >> >> Note that the class that isn't being found is not a Spark class, it's >> a class form your app (SimpleApp$$anonfun$3). That's most probably the >> class that implements the closure you're passing as an argument to the >> reduceByKey() method. Although I can't really explain why the same >> isn't happening for the closure you're passing to map()... >> >> Sorry I can't be more helpful. >> >> > I still get the error, though, with ClassNotFoundException, unless I'm >> not >> > understanding how to run the sc.addJar. I find it a little weird, too, >> that >> > the Spark platform has trouble finding the code that is itself. And why >> only >> > with the reduceByKey function is it occuring? I have no problems with >> any >> > other code running except for that. (BTW, I don't use <masternodeip> in >> my >> > code above... I just removed it for security purposes.) >> > >> > Thanks, >> > >> > Ian >> > >> > >> > >> > On Mon, Apr 14, 2014 at 12:45 PM, Marcelo Vanzin <van...@cloudera.com> >> > wrote: >> >> >> >> Hi Ian, >> >> >> >> When you run your packaged application, are you adding its jar file to >> >> the SparkContext (by calling the addJar() method)? >> >> >> >> That will distribute the code to all the worker nodes. The failure >> >> you're seeing seems to indicate the worker nodes do not have access to >> >> your code. >> >> >> >> On Mon, Apr 14, 2014 at 9:17 AM, Ian Bonnycastle <ibo...@gmail.com> >> wrote: >> >> > Good afternoon, >> >> > >> >> > I'm attempting to get the wordcount example working, and I keep >> getting >> >> > an >> >> > error in the "reduceByKey(_ + _)" call. I've scoured the mailing >> lists, >> >> > and >> >> > haven't been able to find a sure fire solution, unless I'm missing >> >> > something >> >> > big. I did find something close, but it didn't appear to work in my >> >> > case. >> >> > The error is: >> >> > >> >> > org.apache.spark.SparkException: Job aborted: Task 2.0:3 failed 4 >> times >> >> > (most recent failure: Exception failure: >> >> > java.lang.ClassNotFoundException: >> >> > SimpleApp$$anonfun$3) >> >> > at >> >> > >> >> > >> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) >> >> >> >> -- >> >> Marcelo >> > >> > >> >> >> >> -- >> Marcelo >> > >