Hi According to the documentation http://spark.apache.org/docs/1.0.0/api/java/index.html it says *coalesce <http://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/RDD.html#coalesce(int, boolean, scala.math.Ordering)>*(int numPartitions, boolean shuffle, *scala.math.Ordering<**T* <http://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/RDD.html> *> ord*) Return a new RDD that is reduced into numPartitions partitions.
You could try something like the following: val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])], Externalizer[KeyValue]) = ... val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length), false, null) Thanks Best Regards On Thu, Jul 31, 2014 at 7:15 AM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > In my code I have something like > > val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])], > Externalizer[KeyValue]) = ... > val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length)) > > My purpose is to limit the number of partitions (later sortByKey always > reported "too many open files" error). > > However, it won't compile, scala compiler complains "erroneous and > inaccessible type". > > What's the problem? BTW, I found coalesce requires an implicit Ordering, > why does it need that? > > I'm currently using repartition, it compiles fine, the doc says it always > shuffles and recommends using coalesce for reducing partitions. > > Anyone can help me here? > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >