Hi, I've just noticed the new "avg hash probe" metric for HashAggregateExec operator [1].
My understanding (after briefly going through the code in the change and around) is as follows: Average hash map probe per lookup (i.e. `numProbes` / `numKeyLookups`) NOTE: `numProbes` and `numKeyLookups` are used in BytesToBytesMap append-only hash map for the number of iteration to look up a single key and the number of all the lookups in total, respectively. Does this description explain the purpose of the metric? Anything important to add? Given the way it's calculated and the meaning of numProbes, the higher `numProbes` the worse. When would that happen? Why is this important for joins? Why does this affect HashAggregateExec? Does this have anything to do with broadcast joins? I'd appreciate any help on this. Promise to share it with the Spark community in my gitbook or any other place you point to :) Thanks! [1] https://github.com/apache/spark/commit/18066f2e61f430b691ed8a777c9b4e5786bf9dbc Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski