Re: Equally split a RDD partition into two partition at the same node

2017-01-16 Thread Pradeep Gollakota
on, I don’t think >> RDD number of partitions will be increased. >> >> >> >> Thanks, >> >> Jasbir >> >> >> >> *From:* Fei Hu [mailto:hufe...@gmail.com] >> *Sent:* Sunday, January 15, 2017 10:10 PM >> *To:* zouz...@cs.toronto.edu >

Re: Equally split a RDD partition into two partition at the same node

2017-01-16 Thread Fei Hu
t; >>> >>> >>> *From:* Fei Hu [mailto:hufe...@gmail.com] >>> *Sent:* Sunday, January 15, 2017 10:10 PM >>> *To:* zouz...@cs.toronto.edu >>> *Cc:* user @spark ; dev@spark.apache.org >>> *Subject:* Re: Equally split a RDD partition into two p

Re: Equally split a RDD partition into two partition at the same node

2017-01-16 Thread Fei Hu
gt;> > >> >> Thanks in advance, > >> >> Fei > >> >> > >> >> > >> > > >> > > >> > -- > >> > -- Anastasios Zouzias > >> > < > >> > >> > azo@.ibm > >> > >> >

Re: Equally split a RDD partition into two partition at the same node

2017-01-16 Thread Liang-Chi Hsieh
partitions are required to be at the same >> node >> >> with their parent partition, which can help get high data locality. >> >> >> >> Is there anyone who knows how to implement it or any hints for it? >> >> >> >> Thanks in advanc

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
don’t think > RDD number of partitions will be increased. > > > > Thanks, > > Jasbir > > > > *From:* Fei Hu [mailto:hufe...@gmail.com] > *Sent:* Sunday, January 15, 2017 10:10 PM > *To:* zouz...@cs.toronto.edu > *Cc:* user @spark ; dev@spark.apache.org > *Su

RE: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread jasbir.sing
: zouz...@cs.toronto.edu Cc: user @spark ; dev@spark.apache.org Subject: Re: Equally split a RDD partition into two partition at the same node Hi Anastasios, Thanks for your reply. If I just increase the numPartitions to be twice larger, how coalesce(numPartitions: Int, shuffle: Boolean = false

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
> >> Thanks in advance, > >> Fei > >> > >> > > > > > > -- > > -- Anastasios Zouzias > > < > > > azo@.ibm > > > > > > > > > > - > Liang-Chi Hsieh | @viirya > Spark Technology Center > http:

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Liang-Chi Hsieh
- Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Equally-split-a-RDD-partition-into-two-partition-at-the-same-node-tp20597p20608.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
Hi Anastasios, Thanks for your information. I will look into the CoalescedRDD code. Thanks, Fei On Sun, Jan 15, 2017 at 12:21 PM, Anastasios Zouzias wrote: > Hi Fei, > > I looked at the code of CoalescedRDD and probably what I suggested will > not work. > > Speaking of which, CoalescedRDD is p

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Anastasios Zouzias
Hi Fei, I looked at the code of CoalescedRDD and probably what I suggested will not work. Speaking of which, CoalescedRDD is private[spark]. If this was not the case, you could set balanceSlack to 1, and get what you requested, see https://github.com/apache/spark/blob/branch-1.6/core/src/main/sc

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
Hi Anastasios, Thanks for your reply. If I just increase the numPartitions to be twice larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps the data locality? Do I need to define my own Partitioner? Thanks, Fei On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias wrote: > Hi

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
Hi Rishi, Thanks for your reply! The RDD has 24 partitions, and the cluster has a master node + 24 computing nodes (12 cores per node). Each node will have a partition, and I want to split each partition to two sub-partitions on the same node to improve the parallelism and achieve high data locali

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Anastasios Zouzias
Hi Fei, How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ? https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395 coalesce is mostly used for reducing the number of partitions before writing to HDFS, but it might still be a nar

Equally split a RDD partition into two partition at the same node

2017-01-14 Thread Fei Hu
Dear all, I want to equally divide a RDD partition into two partitions. That means, the first half of elements in the partition will create a new partition, and the second half of elements in the partition will generate another new partition. But the two new partitions are required to be at the sa