Hi

According to the documentation
http://spark.apache.org/docs/1.0.0/api/java/index.html it says *coalesce
<http://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/RDD.html#coalesce(int,
boolean, scala.math.Ordering)>*(int numPartitions, boolean shuffle,
*scala.math.Ordering<**T*
<http://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/RDD.html>
*> ord*)
Return a new RDD that is reduced into numPartitions partitions.

​You could try something like the following:
​
val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])],
Externalizer[KeyValue]) = ...
val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length),
false, null)





Thanks
Best Regards


On Thu, Jul 31, 2014 at 7:15 AM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> In my code I have something like
>
> val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])],
> Externalizer[KeyValue]) = ...
> val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length))
>
> My purpose is to limit the number of partitions (later sortByKey always
> reported "too many open files" error).
>
> However, it won't compile, scala compiler complains "erroneous and
> inaccessible type".
>
> What's the problem? BTW, I found coalesce requires an implicit Ordering,
> why does it need that?
>
> I'm currently using repartition, it compiles fine, the doc says it always
> shuffles and recommends using coalesce for reducing partitions.
>
> Anyone can help me here?
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Reply via email to