Which release of Spark are you using ? Thanks
> On May 6, 2015, at 8:03 AM, Wang, Ningjun (LNG-NPV) > <ningjun.w...@lexisnexis.com> wrote: > > I run a job on spark standalone cluster and got the exception below > > Here is the line of code that cause problem > > val myRdd: RDD[(String, String, String)] = … // RDD of (docid, cattegory, > path) > > myRdd.persist(StorageLevel.MEMORY_AND_DISK_SER) > > val cats: Array[String] = myRdd.map(t => t._2).distinct().collect() // This > line cause the exception > > > 15/05/06 10:48:51 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > LAB4-WIN03.pcc.lexisnexis.com): java.io.IOException: > org.apache.spark.SparkException: Failed to get broadcast_2_piece0 of > broadcast_2 > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1156) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) > at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.SparkException: Failed to get broadcast_2_piece0 > of broadcast_2 > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBr > oadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBr > oadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBr > oadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBr > oadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBr > oadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$ > readBlocks(TorrentBroadcast.scala:119) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBr > oadcast.scala:174) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1153) > ... 12 more > > > Any idea what cause the problem and how to avoid it? > > Thanks > Ningjun >