Hi,
With SPARK-11157, the big fat assembly jar build was removed.
Has anyone used spark.yarn.archive - the alternative provided and
successfully deployed Spark on a Yarn cluster. If so, what does the archive
contain. What should be the minimal set. Any suggestion is greatly
appreciated.
Thanks
Hi Maciek,
I've tested several variants for summing "fieldToSum":
First, RDD-style code:
df.as[A].map(_.fieldToSum).reduce(_ + _)
df.as[A].rdd.map(_.fieldToSum).sum()
df.as[A].map(_.fieldToSum).rdd.sum()
All around 30 seconds. "reduce" and "sum" seem to have the same performance,
for this use ca
Hi Julien,
I thought about something like this:
import
org.apache.spark.sql.functions.sumdf.as[A].map(_.fieldToSum).agg(sum("value")).collect()
To try using Dataframes aggregation on Dataset instead of reduce.
Regards,
Maciek
2016-08-28 21:27 GMT+02:00 Julien Dumazert :
> Hi Maciek,
>
> I've