Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
I create an issue in Spark project: SPARK-14820 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Reduce-Shuffle-Data-by-pushing-filter-toward-storage-tp17297p17306.html Sent from the Apache Spark Developers List mailing list archive at Nabble.

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
Hi Marcin I attached a pdf format of issue. Reduce_Shuffle_Data_by_pushing_filter_toward_storage.pdf -- View this message in context: http://apache-spark-develop

[Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
SQL query planner can have intelligence to push down filter commands towards the storage layer. If we optimize the query planner such that the IO to the storage is reduced at the cost of running multiple filters (i.e., compute), this should be desirable when the system is IO bound. An example to pr

[Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
SQL query planner can have intelligence to push down filter commands towards the storage layer. If we optimize the query planner such that the IO to the storage is reduced at the cost of running multiple filters (i.e., compute), this should be desirable when the system is IO bound. An example to pr

Improving system design logging in spark

2016-04-20 Thread atootoonchian
Current spark logging mechanism can be improved by adding the following parameters. It will help in understanding system bottlenecks and provide useful guidelines for Spark application developer to design an optimized application. 1. Shuffle Read Local Time: Time for a task to read shuffle data fr