Hi Jasbir, Yes, you are right. Do you have any idea about my question?
Thanks, Fei On Mon, Jan 16, 2017 at 12:37 AM, <jasbir.s...@accenture.com> wrote: > Hi, > > > > Coalesce is used to decrease the number of partitions. If you give the > value of numPartitions greater than the current partition, I don’t think > RDD number of partitions will be increased. > > > > Thanks, > > Jasbir > > > > *From:* Fei Hu [mailto:hufe...@gmail.com] > *Sent:* Sunday, January 15, 2017 10:10 PM > *To:* zouz...@cs.toronto.edu > *Cc:* user @spark <u...@spark.apache.org>; dev@spark.apache.org > *Subject:* Re: Equally split a RDD partition into two partition at the > same node > > > > Hi Anastasios, > > > > Thanks for your reply. If I just increase the numPartitions to be twice > larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps > the data locality? Do I need to define my own Partitioner? > > > > Thanks, > > Fei > > > > On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias <zouz...@gmail.com> > wrote: > > Hi Fei, > > > > How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ? > > > > https://github.com/apache/spark/blob/branch-1.6/core/ > src/main/scala/org/apache/spark/rdd/RDD.scala#L395 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_branch-2D1.6_core_src_main_scala_org_apache_spark_rdd_RDD.scala-23L395&d=DgMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=7scIIjM0jY9x3fjvY6a_yERLxMA2NwA8l0DnuyrL6yA&m=bFMBTBwSwMOFRd7Or6fF0sQOH87UIhmuUqEO9UkxPIY&s=qNa3MyvKhIDlXHtxm3s0DZJRZaSWIHpaNhcS86GEQow&e=> > > > > coalesce is mostly used for reducing the number of partitions before > writing to HDFS, but it might still be a narrow dependency (satisfying your > requirements) if you increase the # of partitions. > > > > Best, > > Anastasios > > > > On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu <hufe...@gmail.com> wrote: > > Dear all, > > > > I want to equally divide a RDD partition into two partitions. That means, > the first half of elements in the partition will create a new partition, > and the second half of elements in the partition will generate another new > partition. But the two new partitions are required to be at the same node > with their parent partition, which can help get high data locality. > > > > Is there anyone who knows how to implement it or any hints for it? > > > > Thanks in advance, > > Fei > > > > > > > > -- > > -- Anastasios Zouzias > > > > ------------------------------ > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise confidential information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the e-mail by you is prohibited. Where allowed > by local law, electronic communications with Accenture and its affiliates, > including e-mail and instant messaging (including content), may be scanned > by our systems for the purposes of information security and assessment of > internal compliance with Accenture policy. > ____________________________________________________________ > __________________________ > > www.accenture.com >