We had disabled tungsten after we found few performance issues, but had to enable it back because we found that when we had large number of group by fields, if tungsten is disabled the shuffle keeps failing.
Here is an excerpt from one of our engineers with his analysis. With Tungsten Enabled (default in spark 1.5): ~90 files of 0.5G each: Ingest (after applying broadcast lookups) : 54 min Aggregation (~30 fields in group by and another 40 in aggregation) : 18 min With Tungsten Disabled: Ingest : 30 min Aggregation : Erroring out On smaller tests we found that joins are slow with tungsten enabled. With GROUP BY, disabling tungsten is not working in the first place. Hope this helps. -Charmee -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org