Zhenhua Wang created SPARK-18000:
------------------------------------
Summary: Aggregation function for computing endpoints for numeric
histograms
Key: SPARK-18000
URL: https://issues.apache.org/jira/browse/SPARK-18000
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 2.1.0
Reporter: Zhenhua Wang
For a column of numeric type (including date and timestamp), we will generate a
equi-width or equi-height histogram, depending on if its ndv is large than the
maximum number of bins allowed in one histogram (denoted as numBins).
This agg function computes values and their frequencies using a small hashmap,
whose size is less than or equal to "numBins", and returns an equi-width
histogram.
When the size of hashmap exceeds "numBins", it cleans the hashmap and utilizes
ApproximatePercentile to return endpoints of equi-height histogram.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]