>
> From: Aaron Davidson
> Reply-To:
> Date: Sat, 12 Jul 2014 16:32:22 -0700
> To:
> Subject: Re: Confused by groupByKey() and the default partitioner
>
> Yes, groupByKey() does partition by the hash of the key unless you specify
> a custom Partitioner.
>
> (1) If
:
Subject: Re: Confused by groupByKey() and the default partitioner
Yes, groupByKey() does partition by the hash of the key unless you specify a
custom Partitioner.
(1) If you were to use groupByKey() when the data was already partitioned
correctly, the data would indeed not be shuffled. Here
Yes, groupByKey() does partition by the hash of the key unless you specify
a custom Partitioner.
(1) If you were to use groupByKey() when the data was already partitioned
correctly, the data would indeed not be shuffled. Here is the associated
code, you'll see that it simply checks that the Partit
Hi:
I have trouble understanding the default partitioner (hash) in Spark.
Suppose that an RDD with two partitions is created as follows:
x = sc.parallelize([("a", 1), ("b", 4), ("a", 10), ("c", 7)], 2)
Does spark partition x based on the hash of the key (e.g., "a", "b", "c") by
default?
(1) Assumi