Hi Jasbir,
Yes, you are right. Do you have any idea about my question?
Thanks,
Fei
On Mon, Jan 16, 2017 at 12:37 AM, wrote:
> Hi,
>
>
>
> Coalesce is used to decrease the number of partitions. If you give the
> value of numPartitions greater than the current partition, I don’t think
> RDD numb
Hi,
Coalesce is used to decrease the number of partitions. If you give the value of
numPartitions greater than the current partition, I don’t think RDD number of
partitions will be increased.
Thanks,
Jasbir
From: Fei Hu [mailto:hufe...@gmail.com]
Sent: Sunday, January 15, 2017 10:10 PM
To: zou
Any updates for the above error guys ?
On Fri, Jan 13, 2017 at 9:35 PM, Josh Elser wrote:
> (-cc dev@phoenix)
>
> phoenix-4.8.2-HBase-1.2-server.jar in the top-level binary tarball of
> Apache Phoenix 4.8.0 is the jar which is meant to be deployed to all
> HBase's classpath.
>
> I would check t
Hi Liang-Chi,
Yes, you are right. I implement the following solution for this problem,
and it works. But I am not sure if it is efficient:
I double the partitions of the parent RDD, and then use the new partitions
and parent RDD to construct the target RDD. In the compute() function of
the target
Hi,
When calling `coalesce` with `shuffle = false`, it is going to produce at
most min(numPartitions, previous RDD's number of partitions). So I think it
can't be used to double the number of partitions.
Anastasios Zouzias wrote
> Hi Fei,
>
> How you tried coalesce(numPartitions: Int, shuffle:
Hi Sujith,
Thanks for suggestion.
The codes you quoted are from `CollectLimitExec` which will be in the plan
if a logical `Limit` is the final operator in an logical plan. But in the
physical plan you showed, there are `GlobalLimit` and `LocalLimit` for the
logical `Limit` operation, so the `doE
*unsubscribe*
Hi Anastasios,
Thanks for your information. I will look into the CoalescedRDD code.
Thanks,
Fei
On Sun, Jan 15, 2017 at 12:21 PM, Anastasios Zouzias
wrote:
> Hi Fei,
>
> I looked at the code of CoalescedRDD and probably what I suggested will
> not work.
>
> Speaking of which, CoalescedRDD is p
Hi Fei,
I looked at the code of CoalescedRDD and probably what I suggested will not
work.
Speaking of which, CoalescedRDD is private[spark]. If this was not the
case, you could set balanceSlack to 1, and get what you requested, see
https://github.com/apache/spark/blob/branch-1.6/core/src/main/sc
Hi Anastasios,
Thanks for your reply. If I just increase the numPartitions to be twice
larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps
the data locality? Do I need to define my own Partitioner?
Thanks,
Fei
On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias
wrote:
> Hi
Hi Rishi,
Thanks for your reply! The RDD has 24 partitions, and the cluster has a
master node + 24 computing nodes (12 cores per node). Each node will have a
partition, and I want to split each partition to two sub-partitions on the
same node to improve the parallelism and achieve high data locali
Hi,
Will it be a problem if the staging directory is already deleted? Because
even the directory doesn't exist, fs.delete(stagingDirPath, true) won't
cause failure but just return false.
Rostyslav Sotnychenko wrote
> Hi all!
>
> I am a bit confused why Spark AM and Client are both trying to de
As you mentioned, it's called in ForeachSink. I don't know that the
scaladoc is wrong. You're saying something else, that there's no such thing
as local execution. I confess I don't know if that's true? but the doc
isn't wrong in that case, really.
More broadly, I just don't think this type of thi
Hi Fei,
How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395
coalesce is mostly used for reducing the number of partitions before
writing to HDFS, but it might still be a nar
15 matches
Mail list logo