Are you using spark standalone mode? If so, you need to set "spark.io.compression.codec" for all workers.
Best Regards, Shixiong Zhu 2014-10-28 10:37 GMT+08:00 buring <qyqb...@gmail.com>: > Here is error log,I abstract as follows: > INFO [binaryTest---main]: before first > WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver > thread-0]: Lost task 0.0 in stage 0.0 (TID 0, spark-dev136): > org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null > org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:236) > org.xerial.snappy.Snappy.<clinit>(Snappy.java:48) > > > org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:351) > > org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) > > org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) > > java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2288) > > WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver > thread-1]: Lost task 0.1 in stage 0.0 (TID 2, spark-dev136): > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > > > org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:351) > > org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) > > org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) > > java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2288) > > > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2301) > > > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2772) > > java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:778) > java.io.ObjectInputStream.<init>(ObjectInputStream.java:278) > > > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:57) > ERROR > > [org.apache.spark.scheduler.TaskSchedulerImpl---sparkDriver-akka.actor.default-dispatcher-17]: > Lost executor 1 on spark-dev136: remote Akka client disassociated > > Exception in thread "main" org.apache.spark.SparkException: Job aborted > due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent > failure: Lost task 0.3 in stage 0.0 (TID 4, spark-dev134): > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > > > org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:351) > > org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) > > org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) > > java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2288) > that's the error log in console. > > My test code as follows,I can run this correctly on my notebook. I know > there is something wrong with the spark cluster,I want to avoid use snappy > compression which can avoid this problem . > > val conf = new SparkConf().setAppName("binary") > > > conf.set("spark.io.compression.codec","org.apache.spark.io.LZ4CompressionCodec") > val sc = new SparkContext() > > val arr = Array("One, two, buckle my shoe", > "Three, four, shut the door", "Five, six, pick up sticks", > "Seven, eight, lay them straight", "Nine, ten, a big fat hen") > val pairs = arr.indices zip arr > // implicit def int2IntWritable(fint:Int):IntWritable =new IntWritable() > // implicit def string2Writable(fstring:String):Text = new Text() > val rdd = sc.makeRDD(pairs) > logInfo("before first") > println(rdd.first()) > logInfo("after first") > val seq = new SequenceFileRDDFunctions(rdd) > seq.saveAsSequenceFile(args(0)) > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-use-snappy-compression-when-saveAsSequenceFile-tp17350p17424.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >