Re: work around Size exceeds Integer.MAX_VALUE

2015-07-09 Thread Michal Čizmazia
Thanks Matei! It worked. On 9 July 2015 at 19:43, Matei Zaharia wrote: > Thus means that one of your cached RDD partitions is bigger than 2 GB of > data. You can fix it by having more partitions. If you read data from a > file system like HDFS or S3, set the number of partitions higher in the >

Re: work around Size exceeds Integer.MAX_VALUE

2015-07-09 Thread Matei Zaharia
Thus means that one of your cached RDD partitions is bigger than 2 GB of data. You can fix it by having more partitions. If you read data from a file system like HDFS or S3, set the number of partitions higher in the sc.textFile, hadoopFile, etc methods (it's an optional second parameter to thos

Re: work around Size exceeds Integer.MAX_VALUE

2015-07-09 Thread Michal Čizmazia
Spark version 1.4.0 in the Standalone mode 2015-07-09 20:12:02 INFO (sparkDriver-akka.actor.default-dispatcher-3) BlockManagerInfo:59 - Added rdd_0_0 on disk on localhost:51132 (size: 29.8 GB) 2015-07-09 20:12:02 ERROR (Executor task launch worker-0) Executor:96 - Exception in task 0.0 in stage 0

Re: work around Size exceeds Integer.MAX_VALUE

2015-07-09 Thread Ted Yu
Which release of Spark are you using ? Can you show the complete stack trace ? getBytes() could be called from: getBytes(file, 0, file.length) or: getBytes(segment.file, segment.offset, segment.length) Cheers On Thu, Jul 9, 2015 at 2:50 PM, Michal Čizmazia wrote: > Please could anyone