our message will be added to the discussion
>>> below:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/DataFram
>>> e-join-groupBy-agg-question-tp28849p28879.html
>>> To start a new topic under Apache Spark User List, email [hidden email]
>>> &l
to the discussion
>> below:
>> http://apache-spark-user-list.1001560.n3.nabble.com/DataFram
>> e-join-groupBy-agg-question-tp28849p28879.html
>> To start a new topic under Apache Spark User List, email [hidden email]
>> <http:///user/SendEmail.jtp?type=node&am
> http://apache-spark-user-list.1001560.n3.nabble.com/
> DataFrame-join-groupBy-agg-question-tp28849p28879.html
> To start a new topic under Apache Spark User List, email
> ml+s1001560n1...@n3.nabble.com
> To unsubscribe from DataFrame --- join / groupBy-agg question..., click
>
also interested in this.
Is the partition count of df depending on fields of groupby?
Also is the performance of groupby-agg comparable to reducebykey/aggbykey?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-join-groupBy-agg-question
e a larger or smaller number of tasks to avoid
OutOfMemoryError.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-join-groupBy-agg-question-tp28849.html
Sent from the Apache Spark User List mailing list archive at
Hello there,
I may be having a naive question on join / groupBy-agg. During the days of
RDD, whenever I wanted to perform
a. groupBy-agg, I used to say reduceByKey (of PairRDDFunctions) with an
optional Partition-Strategy (with is number of partitions or Partitioner)
b. join (of PairRDDFunctions)