[ https://issues.apache.org/jira/browse/HIVE-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom Temple updated HIVE-7177: ----------------------------- Environment: Redhat 5.10 running Cloudera 5.0.1 (was: Redhat 5.10 running Cloudera 5.0.0) > percentile_approx very inaccurate with high multiplicities in the data > ---------------------------------------------------------------------- > > Key: HIVE-7177 > URL: https://issues.apache.org/jira/browse/HIVE-7177 > Project: Hive > Issue Type: Bug > Components: UDF > Affects Versions: 0.12.0 > Environment: Redhat 5.10 running Cloudera 5.0.1 > Reporter: Tom Temple > > To reproduce: > 1) create a table with a single integer column > 2) with values: 1 million, 2 million, 3 million, and 4 million each repeated > a quarter million times. > 3) percentile_approx(cast(col_0 as double), array(0.33,0.34),1000000) > Expected results: [2000000.0,2000000.0] > Actual results: [1280000.0,1320000.0] (I might be off by 40000 here) -- This message was sent by Atlassian JIRA (v6.2#6252)