[jira] [Commented] (HIVE-16255) Support percentile_cont / percentile_disc

Miklos Gergely (JIRA) Tue, 15 Jan 2019 02:55:23 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-16255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742968#comment-16742968
 ]


Miklos Gergely commented on HIVE-16255:
---------------------------------------

[~ashutoshc] after discussing the issue with Zoltan we came to these 
conclusions:
 * The solution that [~abstractdog] created is basically good, though not 
following the standard syntax defined at 
[https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions110.htm] and 
[https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions111.htm|https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions111.htm.]
 which would require the introducing of the "WITHIN GROUP clause. Having that 
as a mandatory clause the elements would arrive in order, so instead of having 
a multiset, a list could be used to store the elements.
 * This solution has O(n) memory cost, which may be too much. The only way we 
could avoid this is to be able to iterate over the elements twice, first to 
count them, second time to find the appropriate element. This would mean that 
for such functions there should be a way to configure that the execution to do 
so. This is obviously a greater task than the scope of this issue.

 

> Support percentile_cont / percentile_disc
> -----------------------------------------
>
>                 Key: HIVE-16255
>                 URL: https://issues.apache.org/jira/browse/HIVE-16255
>             Project: Hive
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Carter Shanklin
>            Assignee: Laszlo Bodor
>            Priority: Major
>         Attachments: HIVE-16255.01.patch, HIVE-16255.02.patch, 
> HIVE-16255.03.patch, HIVE-16255.04.patch, HIVE-16255.05.patch
>
>
> Way back in HIVE-259, a percentile function was added that provides a subset 
> of the standard percentile_cont aggregate function.
> The SQL standard provides some additional options and also a percentile_disc 
> aggregate function with different rules. In the standard you specify an 
> ordering with arbitrary value expression and the results are drawn from this 
> value expression. This aggregate functions should be usable as analytic 
> functions as well (i.e. support the over clause). The current percentile 
> function is able to be used with an over clause.
> The rough outline of how this works is:
> percentile_cont(number) within group (order by expression) [ over(window 
> spec) ]
> percentile_disc(number) within group (order by expression) [ over(window 
> spec) ]
> The value of number should be between 0 and 1. The value expression is 
> evaluated for each row of the group, nulls are discarded, and the remaining 
> rows are ordered.
> — If PERCENTILE_CONT is specified, by considering the pair of consecutive 
> rows that are indicated by the argument, treated as a fraction of the total 
> number of rows in the group, and interpolating the value of the value 
> expression evaluated for these rows.
> — If PERCENTILE_DISC is specified, by treating the group as a window 
> partition of the CUME_DIST window function, using the specified ordering of 
> the value expression as the window ordering, and returning the  first value 
> expression whose cumulative distribution value is greater than or equal to 
> the argument.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-16255) Support percentile_cont / percentile_disc

Reply via email to