Hi, We are running aggregation on a huge data set (few billion rows). While running the task got the following error (see below). Any ideas? Running spark 1.1.0 on cdh distribution.
... 14/09/17 13:33:30 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 2083 bytes result sent to driver 14/09/17 13:33:30 INFO CoarseGrainedExecutorBackend: Got assigned task 1 14/09/17 13:33:30 INFO Executor: Running task 0.0 in stage 2.0 (TID 1) 14/09/17 13:33:30 INFO TorrentBroadcast: Started reading broadcast variable 2 14/09/17 13:33:30 INFO MemoryStore: ensureFreeSpace(1428) called with curMem=163719, maxMem=34451478282 14/09/17 13:33:30 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1428.0 B, free 32.1 GB) 14/09/17 13:33:30 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 14/09/17 13:33:30 INFO TorrentBroadcast: Reading broadcast variable 2 took 0.027374294 s 14/09/17 13:33:30 INFO MemoryStore: ensureFreeSpace(2336) called with curMem=165147, maxMem=34451478282 14/09/17 13:33:30 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.3 KB, free 32.1 GB) 14/09/17 13:33:30 INFO MapOutputTrackerWorker: Updating epoch to 1 and clearing cache 14/09/17 13:33:30 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 1, fetching them 14/09/17 13:33:30 INFO MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://sparkdri...@sas-model1.pv.sv.nextag.com:56631/user/MapOutputTracker#794212052] 14/09/17 13:33:30 INFO MapOutputTrackerWorker: Got the output locations 14/09/17 13:33:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/09/17 13:33:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 14/09/17 13:33:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 8 ms 14/09/17 13:33:30 ERROR BlockFetcherIterator$BasicBlockFetcherIterator: Error occurred while fetching local blocks java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:104) at org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:120) at org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:358) at org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:208) at org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:205) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator.getLocalBlocks(BlockFetcherIterator.scala:205) at org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator.initialize(BlockFetcherIterator.scala:240) at org.apache.spark.storage.BlockManager.getMultiple(BlockManager.scala:583) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:77) at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:41) at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/09/17 13:33:30 INFO CoarseGrainedExecutorBackend: Got assigned task 2 14/09/17 13:33:30 INFO Executor: Running task 0.0 in stage 1.1 (TID 2) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Size-exceeds-Integer-MAX-VALUE-in-BlockFetcherIterator-tp14483.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org