Here is related code: final int length = totalSize() + neededSize; if (buffer.length < length) { // This will not happen frequently, because the buffer is re-used. final byte[] tmp = new byte[length * 2];
Looks like length was positive (since it was bigger than buffer.length) but length * 2 became negative. We just need to allocate length bytes instead of length * 2 bytes. On Sun, Mar 13, 2016 at 10:39 PM, Ravindra Rawat <ravindra.ra...@gmail.com> wrote: > Greetings, > > I am getting following exception on joining a few parquet files. SPARK-12089 > description has details of the overflow condition which is marked as fixed in > 1.6.1. I recall seeing another issue related to csv files creating same > exception. > > Any pointers on how to debug this or possible workarounds? Google searches > and JIRA comments point to either a > 2GB record size (less likely) or RDD > sizes being too large. > > I had upgraded to Spark 1.6.1 due to Serialization errors from Catalyst while > reading Parquet files. > > Related JIRA Issue => https://issues.apache.org/jira/browse/SPARK-12089 > > Related PR => https://github.com/apache/spark/pull/10142 > > > java.lang.NegativeArraySizeException > at > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:196) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > Thanks. > > -- > Regards > Ravindra >