To add to Aaron's response, `spark.shuffle.consolidateFiles` only applies to hash-based shuffle, so you shouldn't have to set it for sort-based shuffle. And yes, since you changed neither `spark.shuffle.compress` nor `spark.shuffle.spill.compress` you can't possibly have run into what #2890 fixes.
I'm assuming you're running master? Was it before or after this commit: https://github.com/apache/spark/commit/6b79bfb42580b6bd4c4cd99fb521534a94150693 ? -Andrew 2014-10-22 16:37 GMT-07:00 Aaron Davidson <ilike...@gmail.com>: > You may be running into this issue: > https://issues.apache.org/jira/browse/SPARK-4019 > > You could check by having 2000 or fewer reduce partitions. > > On Wed, Oct 22, 2014 at 1:48 PM, DB Tsai <dbt...@dbtsai.com> wrote: > >> PS, sorry for spamming the mailing list. Based my knowledge, both >> spark.shuffle.spill.compress and spark.shuffle.compress are default to >> true, so in theory, we should not run into this issue if we don't >> change any setting. Is there any other big we run into? >> >> Thanks. >> >> Sincerely, >> >> DB Tsai >> ------------------------------------------------------- >> My Blog: https://www.dbtsai.com >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai <dbt...@dbtsai.com> wrote: >> > Or can it be solved by setting both of the following setting into true >> for now? >> > >> > spark.shuffle.spill.compress true >> > spark.shuffle.compress ture >> > >> > Sincerely, >> > >> > DB Tsai >> > ------------------------------------------------------- >> > My Blog: https://www.dbtsai.com >> > LinkedIn: https://www.linkedin.com/in/dbtsai >> > >> > >> > On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai <dbt...@dbtsai.com> wrote: >> >> It seems that this issue should be addressed by >> >> https://github.com/apache/spark/pull/2890 ? Am I right? >> >> >> >> Sincerely, >> >> >> >> DB Tsai >> >> ------------------------------------------------------- >> >> My Blog: https://www.dbtsai.com >> >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> >> >> >> On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai <dbt...@dbtsai.com> wrote: >> >>> Hi all, >> >>> >> >>> With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but >> >>> I've another exception now. I've no clue about what's going on; does >> >>> anyone run into similar issue? Thanks. >> >>> >> >>> This is the configuration I use. >> >>> spark.rdd.compress true >> >>> spark.shuffle.consolidateFiles true >> >>> spark.shuffle.manager SORT >> >>> spark.akka.frameSize 128 >> >>> spark.akka.timeout 600 >> >>> spark.core.connection.ack.wait.timeout 600 >> >>> spark.core.connection.auth.wait.timeout 300 >> >>> >> >>> >> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325) >> >>> >> >> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794) >> >>> >> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) >> >>> java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) >> >>> >> >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:57) >> >>> >> >> org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:57) >> >>> >> >> org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95) >> >>> >> >> org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351) >> >>> >> >> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196) >> >>> >> >> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196) >> >>> >> >> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) >> >>> >> >> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) >> >>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >> >>> >> >> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) >> >>> >> >> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >> >>> >> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89) >> >>> >> >> org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44) >> >>> org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) >> >>> >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> >>> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> >>> >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> >>> >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >> >>> >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> >>> >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >> >>> >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> >>> org.apache.spark.scheduler.Task.run(Task.scala:56) >> >>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) >> >>> >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> >>> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> >>> java.lang.Thread.run(Thread.java:744) >> >>> >> >>> >> >>> Sincerely, >> >>> >> >>> DB Tsai >> >>> ------------------------------------------------------- >> >>> My Blog: https://www.dbtsai.com >> >>> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >