Thank you, that helps a lot. On Mon, Feb 22, 2016 at 6:01 PM, Takeshi Yamamuro <linguin....@gmail.com> wrote:
> You're correct, reduceByKey is just an example. > > On Tue, Feb 23, 2016 at 10:57 AM, Jay Luan <jaylu...@gmail.com> wrote: > >> Could you elaborate on how this would work? >> >> So from what I can tell, this maps a key to a tuple which always has a 0 >> as the second element. From there the hash widely changes because we now >> hash something like ((1,4), 0) and ((1,3), 0). Thus mapping this would >> create more even partitions. Why reduce by key after? Is that just an >> example of an operation that can be done? Or does it provide some kind of >> real value to the operation. >> >> >> >> On Mon, Feb 22, 2016 at 5:48 PM, Takeshi Yamamuro <linguin....@gmail.com> >> wrote: >> >>> Hi, >>> >>> How about adding dummy values? >>> values.map(d => (d, 0)).reduceByKey(_ + _) >>> >>> On Tue, Feb 23, 2016 at 10:15 AM, jluan <jaylu...@gmail.com> wrote: >>> >>>> I was wondering, is there a way to force something like the hash >>>> partitioner >>>> to use the entire entry of a PairRDD as a hash rather than just the key? >>>> >>>> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), >>>> (2, >>>> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the >>>> partitioner to hash the entire tuple such as (1,4)? >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >>> >>> -- >>> --- >>> Takeshi Yamamuro >>> >> >> > > > -- > --- > Takeshi Yamamuro >