HIVE-13574 <https://issues.apache.org/jira/browse/HIVE-13574>
Created and assigned to myself Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 20 April 2016 at 06:54, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Will do thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 19 April 2016 at 23:33, Alan Gates <alanfga...@gmail.com> wrote: > >> Have you filed a JIRA ticket for this? If not, please do so we can track >> it and fix it. Patches are welcomed as well. :) >> >> Alan. >> >> > On Apr 4, 2016, at 15:27, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> > >> > >> > Hi, >> > >> > I reported back in April 2015 that what Hive calls Standard Deviation >> Function STDDEV is a pointer to STDDEV_POP. This is incorrect and has not >> been rectified in Hive 2 >> > >> > Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also >> I did tests with Spark 1.6 as well and Spark correctly points STTDEV to >> STDDEV_SAMP. >> > >> > The following query was used >> > >> > SELECT >> > >> >> SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1)) >> AS MYSTDDEV, >> > STDDEV(amount_sold) AS STDDEV, >> > STDDEV_SAMP(amount_sold) AS STDDEV_SAMP, >> > STDDEV_POP(amount_sold) AS STDDEV_POP >> > from sales; >> > >> > The following is from running the above query on Hive where STDDEV --> >> STDDEV_POP which is incorrect >> > >> > >> > >> +--------------------+---------------------+--------------------+---------------------+--+ >> > | mystddev | stddev | stddev_samp | >> stddev_pop | >> > >> +--------------------+---------------------+--------------------+---------------------+--+ >> > | 260.7270919450411 | 260.72704617040444 | 260.7270722861465 | >> 260.72704617040444 | >> > >> +--------------------+---------------------+--------------------+---------------------+--+ >> > >> > The following is from Spark-sql where STDDEV --> STDDEV_SAMP which is >> correct >> > >> > >> +--------------------+---------------------+--------------------+---------------------+--+ >> > | mystddev | stddev | stddev_samp | >> stddev_pop | >> > >> +--------------------+---------------------+--------------------+---------------------+--+ >> > | 260.7270919450411 | 260.7270722861637 | 260.7270722861637 | >> 260.72704617042166 | >> > >> +--------------------+---------------------+--------------------+---------------------+--+ >> > >> > Hopefully The Hive one will be corrected. >> > >> > Thanks >> > >> > >> > Dr Mich Talebzadeh >> > >> > LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> > >> > http://talebzadehmich.wordpress.com >> > >> >> >