Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ted Yu
Created https://github.com/apache/spark/pull/8703 to make exception message more helpful. On Thu, Sep 10, 2015 at 1:24 PM, Ashish Shenoy wrote: > Yup thanks Ted. My getPartition() method had a bug where a signed int was > being moduloed with the number of partitions. Fixed that. > > Thanks, > As

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ashish Shenoy
Yup thanks Ted. My getPartition() method had a bug where a signed int was being moduloed with the number of partitions. Fixed that. Thanks, Ashish On Thu, Sep 10, 2015 at 10:44 AM, Ted Yu wrote: > Here is snippet of ExternalSorter.scala where ArrayIndexOutOfBoundsException > was thrown: > >

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ted Yu
Here is snippet of ExternalSorter.scala where ArrayIndexOutOfBoundsException was thrown: while (iterator.hasNext) { val partitionId = iterator.nextPartition() iterator.writeNext(partitionWriters(partitionId)) } Meaning, partitionId was negative. Execute the following and examin

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ashish Shenoy
I am using spark-1.4.1 Here's the skeleton code: JavaPairRDD rddPair = rdd.repartitionAndSortWithinPartitions( new CustomPartitioner(), new ExportObjectComparator()) .persist(StorageLevel.MEMORY_AND_DISK_SER()); ... @SuppressWarnings("serial") private static class CustomPartitioner exte

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-09 Thread Ted Yu
Which release of Spark are you using ? Can you show skeleton of your partitioner and comparator ? Thanks > On Sep 9, 2015, at 4:45 PM, Ashish Shenoy wrote: > > Hi, > > I am trying to sort a RDD pair using repartitionAndSortWithinPartitions() for > my key [which is a custom class, not a jav

ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-09 Thread Ashish Shenoy
Hi, I am trying to sort a RDD pair using repartitionAndSortWithinPartitions() for my key [which is a custom class, not a java primitive] using a custom partitioner on that key and a custom comparator. However, it fails consistently: org.apache.spark.SparkException: Job aborted due to stage failur