Re: DAGScheduler: Failed to run foreach

2014-06-24 Thread Aaron Davidson
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runW

RE: DAGScheduler: Failed to run foreach

2014-06-24 Thread Sameer Tilak
ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) From: ilike...@gmail.com Date: Mon, 23 Jun 2014 18:00:27 -0700 Subject: Re: DAGScheduler: Failed t

Re: DAGScheduler: Failed to run foreach

2014-06-23 Thread Aaron Davidson
Please note that this: for (sentence <- sourcerdd) { ... } is actually Scala syntactic sugar which is converted into sourcerdd.foreach { sentence => ... } What this means is that this will actually run on the cluster, which is probably not what you want if you're trying to print them. Try t

RE: DAGScheduler: Failed to run foreach

2014-06-23 Thread Sameer Tilak
The subject should be: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: and not DAGScheduler: Failed to run foreach If I call printScoreCanndedString with a hard-coded string and identical 2nd parameter, it works fine.