Grouping is applied in the aggregation.
From: holden.ka...@gmail.com [mailto:holden.ka...@gmail.com] On Behalf Of
Holden Karau
Sent: Thu, Mar 10, 2016 13:56
To: Gerhard Fiedler
Cc: user@spark.apache.org
Subject: Re: Partitioning to speed up processing?
Are they entire data set aggregates or is
I have a number of queries that result in a sequence Filter > Project >
Aggregate. I wonder whether partitioning the input table makes sense.
Does Aggregate benefit from a partitioned input? If so, what partitions would
be most useful (related to the aggregations)?
Do Filter and Project preserv
/create-cluster.html)
doesn’t have a similar argument.
Gerhard
From: Sonal Goyal [mailto:sonalgoy...@gmail.com]
Sent: Wed, Mar 09, 2016 04:28
To: Wang, Daoyuan
Cc: Gerhard Fiedler; user@spark.apache.org
Subject: Re: How to add a custom jar file to the Spark driver?
Hi Gerhard,
I just stumbled upon
We're running Spark 1.6.0 on EMR, in YARN client mode. We run Python code, but
we want to add a custom jar file to the driver.
When running on a local one-node standalone cluster, we just use
spark.driver.extraClassPath and everything works:
spark-submit --conf spark.driver.extraClassPath=/path