We had disabled tungsten after we found few performance issues, but had to
enable it back because we found that when we had large number of group by
fields, if tungsten is disabled the shuffle keeps failing. 

Here is an excerpt from one of our engineers with his analysis. 

With Tungsten Enabled (default in spark 1.5): 
~90 files of 0.5G each: 

Ingest (after applying broadcast lookups) : 54 min 
Aggregation (~30 fields in group by and another 40 in aggregation) : 18 min 

With Tungsten Disabled: 

Ingest : 30 min 
Aggregation : Erroring out 

On smaller tests we found that joins are slow with tungsten enabled. With
GROUP BY, disabling tungsten is not working in the first place. 

Hope this helps. 

-Charmee



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to