Mich Talebzadeh created HIVE-13574:
--------------------------------------

             Summary: Standard Deviation in Hive 2 is still incorrect
                 Key: HIVE-13574
                 URL: https://issues.apache.org/jira/browse/HIVE-13574
             Project: Hive
          Issue Type: Bug
          Components: Build Infrastructure
    Affects Versions: 2.0.0
         Environment: All
            Reporter: Mich Talebzadeh
            Assignee: Mich Talebzadeh
            Priority: Minor
             Fix For: 0.10.1


I reported back in April 2015 that what Hive calls Standard Deviation Function  
STDDEV is a pointer to STDDEV_POP. This is incorrect and has not been rectified 
in Hive 2

Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I did 
tests with Spark 1.6 as well and Spark correctly points STTDEV to STDDEV_SAMP.

The following query was used 

SELECT
        
SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
 AS MYSTDDEV,
        STDDEV(amount_sold) AS STDDEV,
        STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
        STDDEV_POP(amount_sold) AS STDDEV_POP
from    sales;

The following is from running the above query on Hive where STDDEV -->  
STDDEV_POP which is incorrect

+--------------------+---------------------+--------------------+---------------------+--+
|      mystddev      |       stddev        |    stddev_samp     |     
stddev_pop      |
+--------------------+---------------------+--------------------+---------------------+--+
| 260.7270919450411  | 260.72704617040444  | 260.7270722861465  | 
260.72704617040444  |
+--------------------+---------------------+--------------------+---------------------+--+

The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is correct

+--------------------+---------------------+--------------------+---------------------+--+
|      mystddev      |       stddev        |    stddev_samp     |     
stddev_pop      |
+--------------------+---------------------+--------------------+---------------------+--+
| 260.7270919450411  | 260.7270722861637   | 260.7270722861637  | 
260.72704617042166  |
+--------------------+---------------------+--------------------+---------------------+--+

Hopefully The Hive one will be corrected.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to