Using Spark 2.2.0 SparkSession extensions to optimize file filtering

2017-10-24 Thread Chris Luby
I have an external catalog that has additional information on my Parquet files that I want to match up with the parsed filters from the plan to prune the list of files included in the scan. I’m looking at doing this using the Spark 2.2.0 SparkSession extensions similar to the built in partition

(SPARK-22148) TaskSetManager.abortIfCompletelyBlacklisted should not abort when all current executors are blacklisted but dynamic allocation is enabled

2017-10-24 Thread Juan Rodríguez Hortalá
Hi, I've been working on this issue, and I would like to get your feedback on the following approach. The idea is that instead of failing in `TaskSetManager.abortIfCompletelyBlacklisted`, when a task cannot be scheduled in any executor but dynamic allocation is enabled, we will register this task