Uses of avg hash probe metric in HashAggregateExec?

Jacek Laskowski Tue, 19 Sep 2017 05:13:32 -0700

Hi,

I've just noticed the new "avg hash probe" metric for HashAggregateExec
operator [1].


My understanding (after briefly going through the code in the change and
around) is as follows:

Average hash map probe per lookup (i.e. `numProbes` / `numKeyLookups`)

NOTE: `numProbes` and `numKeyLookups` are used in BytesToBytesMap
append-only hash map for the number of iteration to look up a single key
and the number of all the lookups in total, respectively.

Does this description explain the purpose of the metric? Anything important
to add?

Given the way it's calculated and the meaning of numProbes, the higher
`numProbes` the worse. When would that happen? Why is this important for
joins? Why does this affect HashAggregateExec? Does this have anything to
do with broadcast joins?

I'd appreciate any help on this. Promise to share it with the Spark
community in my gitbook or any other place you point to :) Thanks!

[1]
https://github.com/apache/spark/commit/18066f2e61f430b691ed8a777c9b4e5786bf9dbc

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

Uses of avg hash probe metric in HashAggregateExec?

Reply via email to