[jira] [Updated] (SPARK-18000) Aggregation function for computing endpoints for histograms

Zhenhua Wang (JIRA) Tue, 25 Oct 2016 20:26:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhenhua Wang updated SPARK-18000:
---------------------------------
    Summary: Aggregation function for computing endpoints for histograms  (was: 
Aggregation function for computing endpoints for numeric histograms)

> Aggregation function for computing endpoints for histograms
> -----------------------------------------------------------
>
>                 Key: SPARK-18000
>                 URL: https://issues.apache.org/jira/browse/SPARK-18000
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Zhenhua Wang
>
> For a column of numeric type (including date and timestamp), we will generate 
> a equi-width or equi-height histogram, depending on if its ndv is large than 
> the maximum number of bins allowed in one histogram (denoted as numBins).
> This agg function computes values and their frequencies using a small 
> hashmap, whose size is less than or equal to "numBins", and returns an 
> equi-width histogram. 
> When the size of hashmap exceeds "numBins", it cleans the hashmap and 
> utilizes ApproximatePercentile to return endpoints of equi-height histogram.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-18000) Aggregation function for computing endpoints for histograms

Reply via email to