on, I don’t think
>> RDD number of partitions will be increased.
>>
>>
>>
>> Thanks,
>>
>> Jasbir
>>
>>
>>
>> *From:* Fei Hu [mailto:hufe...@gmail.com]
>> *Sent:* Sunday, January 15, 2017 10:10 PM
>> *To:* zouz...@cs.toronto.edu
>
t;
>>>
>>>
>>> *From:* Fei Hu [mailto:hufe...@gmail.com]
>>> *Sent:* Sunday, January 15, 2017 10:10 PM
>>> *To:* zouz...@cs.toronto.edu
>>> *Cc:* user @spark ; dev@spark.apache.org
>>> *Subject:* Re: Equally split a RDD partition into two p
gt;>
> >> >> Thanks in advance,
> >> >> Fei
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > -- Anastasios Zouzias
> >> > <
> >>
> >> > azo@.ibm
> >>
> >> >
partitions are required to be at the same
>> node
>> >> with their parent partition, which can help get high data locality.
>> >>
>> >> Is there anyone who knows how to implement it or any hints for it?
>> >>
>> >> Thanks in advanc
don’t think
> RDD number of partitions will be increased.
>
>
>
> Thanks,
>
> Jasbir
>
>
>
> *From:* Fei Hu [mailto:hufe...@gmail.com]
> *Sent:* Sunday, January 15, 2017 10:10 PM
> *To:* zouz...@cs.toronto.edu
> *Cc:* user @spark ; dev@spark.apache.org
> *Su
: zouz...@cs.toronto.edu
Cc: user @spark ; dev@spark.apache.org
Subject: Re: Equally split a RDD partition into two partition at the same node
Hi Anastasios,
Thanks for your reply. If I just increase the numPartitions to be twice larger,
how coalesce(numPartitions: Int, shuffle: Boolean = false
> >> Thanks in advance,
> >> Fei
> >>
> >>
> >
> >
> > --
> > -- Anastasios Zouzias
> > <
>
> > azo@.ibm
>
> > >
>
>
>
>
>
> -
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http:
-
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Equally-split-a-RDD-partition-into-two-partition-at-the-same-node-tp20597p20608.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Hi Anastasios,
Thanks for your information. I will look into the CoalescedRDD code.
Thanks,
Fei
On Sun, Jan 15, 2017 at 12:21 PM, Anastasios Zouzias
wrote:
> Hi Fei,
>
> I looked at the code of CoalescedRDD and probably what I suggested will
> not work.
>
> Speaking of which, CoalescedRDD is p
Hi Fei,
I looked at the code of CoalescedRDD and probably what I suggested will not
work.
Speaking of which, CoalescedRDD is private[spark]. If this was not the
case, you could set balanceSlack to 1, and get what you requested, see
https://github.com/apache/spark/blob/branch-1.6/core/src/main/sc
Hi Anastasios,
Thanks for your reply. If I just increase the numPartitions to be twice
larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps
the data locality? Do I need to define my own Partitioner?
Thanks,
Fei
On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias
wrote:
> Hi
Hi Rishi,
Thanks for your reply! The RDD has 24 partitions, and the cluster has a
master node + 24 computing nodes (12 cores per node). Each node will have a
partition, and I want to split each partition to two sub-partitions on the
same node to improve the parallelism and achieve high data locali
Hi Fei,
How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395
coalesce is mostly used for reducing the number of partitions before
writing to HDFS, but it might still be a nar
Dear all,
I want to equally divide a RDD partition into two partitions. That means,
the first half of elements in the partition will create a new partition,
and the second half of elements in the partition will generate another new
partition. But the two new partitions are required to be at the sa
14 matches
Mail list logo