Hi, We had a Spark-a-thon in Warsaw, Poland [1] where we set on learning QueryPlan API. My initial idea was to start with Analyzer and register a custom Rule[LogicalPlan] using extendedResolutionRules [2].
We were glad to have seen the scaladoc: "Override to provide additional rules for the "Resolution" batch." after we had got stuck how to register the custom rule. It turned out to be very different from what you could do using ExperimentalMethods [3] that offers you two extension points for the query planner (i.e. SparkPlanner and SparkOptimizer). We ended up overriding SparkSession, SessionState, and Analyzer which was an excellent coding exercise for the sparkathon, but could be too much given the simplicity of ExperimentalMethods for the query planner. See a solution: https://gist.github.com/loostro/5a14dbca9b52b841cb97d17ba952943f Can we do better? Could we have done it using other extension points to Analyzer? Are there any plans on opening Analyzer (akin to SparkPlanner)? Why? When? That could be another excellent coding exercise for sparkathon, couldn't it? Please help. [1] https://www.meetup.com/WarsawScala/events/238558617/ [2] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L107 [3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org