[ https://issues.apache.org/jira/browse/HIVE-13574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mich Talebzadeh updated HIVE-13574: ----------------------------------- Target Version/s: (was: 2.0.0) > Standard Deviation in Hive 2 is still incorrect > ----------------------------------------------- > > Key: HIVE-13574 > URL: https://issues.apache.org/jira/browse/HIVE-13574 > Project: Hive > Issue Type: Bug > Affects Versions: 2.0.0 > Environment: All > Reporter: Mich Talebzadeh > Assignee: Mich Talebzadeh > Priority: Minor > Labels: patch > Original Estimate: 48h > Remaining Estimate: 48h > > I reported back in April 2015 that what Hive calls Standard Deviation > Function STDDEV is a pointer to STDDEV_POP. This is incorrect and has not > been rectified in Hive 2 > Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I did > tests with Spark 1.6 as well and Spark correctly points STTDEV to STDDEV_SAMP. > The following query was used > SELECT > > SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1)) > AS MYSTDDEV, > STDDEV(amount_sold) AS STDDEV, > STDDEV_SAMP(amount_sold) AS STDDEV_SAMP, > STDDEV_POP(amount_sold) AS STDDEV_POP > from sales; > The following is from running the above query on Hive where STDDEV --> > STDDEV_POP which is incorrect > +--------------------+---------------------+--------------------+---------------------+--+ > | mystddev | stddev | stddev_samp | > stddev_pop | > +--------------------+---------------------+--------------------+---------------------+--+ > | 260.7270919450411 | 260.72704617040444 | 260.7270722861465 | > 260.72704617040444 | > +--------------------+---------------------+--------------------+---------------------+--+ > The following is from Spark-sql where STDDEV --> STDDEV_SAMP which is correct > +--------------------+---------------------+--------------------+---------------------+--+ > | mystddev | stddev | stddev_samp | > stddev_pop | > +--------------------+---------------------+--------------------+---------------------+--+ > | 260.7270919450411 | 260.7270722861637 | 260.7270722861637 | > 260.72704617042166 | > +--------------------+---------------------+--------------------+---------------------+--+ > Hopefully The Hive one will be corrected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)