I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160>: Give a warning when grouping or ordering by a constant column.
Thanks Gopal. -- Lefty On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz <leftylever...@gmail.com> wrote: > ---------- Forwarded message ---------- > From: Lefty Leverenz <leftylever...@gmail.com> > Date: Tue, Mar 31, 2015 at 1:47 AM > Subject: Re: ORDER BY clause in Hive > To: Mich Talebzadeh <m...@peridale.co.uk> > > > Hive as I see it does not support ORDER BY *Column position*. It only >> supports ORDER BY *Column name*. >> > > That's just in Hive release 0.10.0 and earlier. In release 0.11.0+ you > can set the configuration parameter > <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive> > *hive.groupby.orderby.position.**alias* to true, and then you can use > column positions in ORDER BY. > > Here's the new documentation: > > - hive.groupby.orderby.position.alias > > <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias> > - Order By > > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy> > > - Group By > > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax> > > > -- Lefty > > On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh <m...@peridale.co.uk> > wrote: > >> Gents, >> >> >> >> Hive as I see it does not support ORDER BY *Column position*. It only >> supports ORDER BY *Column name*. >> >> >> >> Thanks >> >> >> >> >> >> Mich Talebzadeh >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> *Publications due shortly:* >> >> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen >> and Coherence Cache* >> >> >> >> NOTE: The information in this email is proprietary and confidential. This >> message is for the designated recipient only, if you are not the intended >> recipient, you should destroy it immediately. Any information in this >> message shall not be understood as given or endorsed by Peridale Ltd, its >> subsidiaries or their employees, unless expressly so stated. It is the >> responsibility of the recipient to ensure that this email is virus free, >> therefore neither Peridale Ltd, its subsidiaries nor their employees accept >> any responsibility. >> >> >> >> *From:* Gopal Vijayaraghavan [mailto:go...@hortonworks.com] *On Behalf >> Of *Gopal Vijayaraghavan >> *Sent:* 30 March 2015 05:26 >> *To:* user@hive.apache.org >> *Cc:* Lefty Leverenz >> >> *Subject:* Re: ORDER BY clause in Hive >> >> >> >> Hi Lefty, >> >> >> >> Couldn’t find the documentation for what >> hive.groupby.orderby.position.alias=true does. >> >> >> >> I suspect that might be what Mich was looking for (though I tend to write >> the column names explicitly). >> >> >> >> Cheers, >> >> Gopal >> >> >> >> *From: *Lefty Leverenz <leftylever...@gmail.com> >> *Reply-To: *"user@hive.apache.org" <user@hive.apache.org> >> *Date: *Sunday, March 29, 2015 at 8:32 PM >> *To: *"user@hive.apache.org" <user@hive.apache.org> >> *Subject: *Re: ORDER BY clause in Hive >> >> >> >> I added information about this in the Order By >> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy> >> section of the wiki. Thanks, Mich and Gopal! >> >> >> -- Lefty >> >> >> >> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <m...@peridale.co.uk> >> wrote: >> >> Hi Lefty, Gopal, >> >> >> >> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result >> set column three” which standard SQL as evident from Oracle and Sybase does. >> >> >> >> So I made it an ORDER BY from the result set EXPLICITELY as shown below >> and it worked OK >> >> >> >> SELECT >> >> rs.Customer_ID >> >> , rs.Number_of_orders >> >> , rs.Total_customer_amount >> >> , rs.Average_order >> >> , rs.Standard_deviation >> >> FROM >> >> ( >> >> SELECT cust_id AS Customer_ID, >> >> COUNT(amount_sold) AS Number_of_orders, >> >> SUM(amount_sold) AS Total_customer_amount, >> >> AVG(amount_sold) AS Average_order, >> >> STDDEV(amount_sold) AS Standard_deviation >> >> FROM sales >> >> GROUP BY cust_id >> >> HAVING SUM(amount_sold) > 94000 >> >> AND AVG(amount_sold) < STDDEV(amount_sold) >> >> ) rs >> >> ORDER BY >> >> *rs.Total_customer_amount* >> >> ; >> >> >> >> >> >> >> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+ >> >> | rs.customer_id | rs.number_of_orders | rs.total_customer_amount | >> rs.average_order | rs.standard_deviation | >> >> >> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+ >> >> | 1743.0 | 238 | 94786.12999999993 | >> 398.2610504201678 | 581.0439095219863 | >> >> | 429.0 | 231 | 94819.41000000006 | >> 410.4736363636366 | 613.7057080691426 | >> >> | 2994.0 | 227 | 94862.61000000006 | >> 417.89696035242315 | 623.1607772763742 | >> >> | 6395.0 | 268 | 97010.47999999998 | >> 361.97940298507456 | 576.9120977984521 | >> >> | 12783.0 | 240 | 97573.54999999996 | >> 406.5564583333332 | 590.4445500393804 | >> >> | 4974.0 | 235 | 98006.16000000002 | >> 417.0474893617022 | 624.337482834059 | >> >> | 42167.0 | 266 | 98585.96000000002 | >> 370.6239097744362 | 590.965120684093 | >> >> | 10747.0 | 256 | 99578.08999999997 | >> 388.9769140624999 | 600.7615005975689 | >> >> | 11407.0 | 248 | 103412.65999999995 | >> 416.9865322580643 | 622.221465710723 | >> >> >> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+ >> >> 9 rows selected (209.699 seconds) >> >> >> >> >> >> Regards, >> >> >> >> Mich Talebzadeh >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> *Publications due shortly:* >> >> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen >> and Coherence Cache* >> >> >> >> NOTE: The information in this email is proprietary and confidential. This >> message is for the designated recipient only, if you are not the intended >> recipient, you should destroy it immediately. Any information in this >> message shall not be understood as given or endorsed by Peridale Ltd, its >> subsidiaries or their employees, unless expressly so stated. It is the >> responsibility of the recipient to ensure that this email is virus free, >> therefore neither Peridale Ltd, its subsidiaries nor their employees accept >> any responsibility. >> >> >> >> *From:* Mich Talebzadeh [mailto:m...@peridale.co.uk] >> *Sent:* 29 March 2015 00:11 >> *To:* user@hive.apache.org >> *Subject:* ORDER BY clause in Hive >> >> >> >> Hi, >> >> >> >> Can someone point me to doc or otherwise to see if ORDER BY clause in >> Hive is working OK >> >> >> >> I have a simple aggregate query as follows: >> >> >> >> SELECT cust_id AS Customer_ID, >> >> COUNT(amount_sold) AS Number_of_orders, >> >> SUM(amount_sold) AS Total_customer_amount, >> >> AVG(amount_sold) AS Average_order, >> >> STDDEV(amount_sold) AS Standard_deviation >> >> FROM sales >> >> GROUP BY cust_id >> >> HAVING SUM(amount_sold) > 94000 >> >> AND AVG(amount_sold) < STDDEV(amount_sold) >> >> ORDER BY 3 ; >> >> >> >> The original table and data are from Oracle sh.sales table >> >> >> >> Oracle comes back for this query with >> >> >> >> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER >> STANDARD_DEVIATION >> >> ----------- ---------------- --------------------- ------------- >> ------------------ >> >> 1743 238 94786.13 398.26105 >> 582.26845 >> >> 429 231 94819.41 410.473636 >> 615.038404 >> >> 2994 227 94862.61 417.89696 >> 624.53793 >> >> 6395 268 97010.48 361.979403 >> 577.991448 >> >> 12783 240 97573.55 >> 406.556458 591.6785 >> >> 4974 235 98006.16 417.047489 >> 625.670115 >> >> 42167 266 98585.96 370.62391 >> 592.079099 >> >> 10747 256 99578.09 388.976914 >> 601.938312 >> >> 11407 248 103412.66 416.986532 >> 623.479751 >> >> >> >> 9 rows selected. >> >> >> >> Ordered by TOTAL_CUSTOMER_AMOUNT >> >> >> >> And hive returns for the same query >> >> >> >> >> +--------------+-------------------+------------------------+---------------------+---------------------+--+ >> >> | customer_id | number_of_orders | total_customer_amount | >> average_order | standard_deviation | >> >> >> +--------------+-------------------+------------------------+---------------------+---------------------+--+ >> >> | 42167.0 | 266 | 98585.96000000002 | >> 370.6239097744362 | 590.965120684093 | >> >> | 12783.0 | 240 | 97573.54999999996 | >> 406.5564583333332 | 590.4445500393804 | >> >> | 11407.0 | 248 | 103412.65999999995 | >> 416.9865322580643 | 622.221465710723 | >> >> | 10747.0 | 256 | 99578.08999999997 | >> 388.9769140624999 | 600.7615005975689 | >> >> | 6395.0 | 268 | 97010.47999999998 | >> 361.97940298507456 | 576.9120977984521 | >> >> | 4974.0 | 235 | 98006.16000000002 | >> 417.0474893617022 | 624.337482834059 | >> >> | 2994.0 | 227 | 94862.61000000006 | >> 417.89696035242315 | 623.1607772763742 | >> >> | 1743.0 | 238 | 94786.12999999993 | >> 398.2610504201678 | 581.0439095219863 | >> >> | 429.0 | 231 | 94819.41000000006 | >> 410.4736363636366 | 613.7057080691426 | >> >> >> +--------------+-------------------+------------------------+---------------------+---------------------+--+ >> >> 9 rows selected (215.774 seconds) >> >> >> >> But ordering in Hive does not seem to be correct! Please note ordering is >> on column three, *total_customer_amount* >> >> >> >> I also tried this in Sybase and got the same as Oracle. >> >> >> >> Adaptive Server cpu time: 100 ms. >> >> Customer_ID Number_of_orders >> Total_customer_amount >> Average_order Standard_deviation >> >> -------------------- ---------------- >> ----------------------------------------- >> ---------------------------------------------------- >> --------------------------- >> >> 1743 238 >> 94786.13 >> 398.2610504201680 582.268450 >> >> 429 231 >> 94819.41 >> 410.4736363636363 615.038404 >> >> 2994 227 >> 94862.61 >> 417.8969603524229 624.537930 >> >> 6395 268 >> 97010.48 >> 361.9794029850746 577.991448 >> >> 12783 240 >> 97573.55 >> 406.5564583333333 591.678500 >> >> 4974 235 >> 98006.16 >> 417.0474893617021 625.670115 >> >> 42167 266 >> 98585.96 >> 370.6239097744360 592.079099 >> >> 10747 256 >> 99578.09 388.9769140625000 >> 601.938312 >> >> 11407 248 >> 103412.66 >> 416.9865322580645 623.479751 >> >> >> >> I tried Google search and seems to be different suggestions. May be I >> have to rewrite the code? >> >> >> >> Thanks >> >> >> >> Mich Talebzadeh >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> *Publications due shortly:* >> >> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen >> and Coherence Cache* >> >> >> >> NOTE: The information in this email is proprietary and confidential. This >> message is for the designated recipient only, if you are not the intended >> recipient, you should destroy it immediately. Any information in this >> message shall not be understood as given or endorsed by Peridale Ltd, its >> subsidiaries or their employees, unless expressly so stated. It is the >> responsibility of the recipient to ensure that this email is virus free, >> therefore neither Peridale Ltd, its subsidiaries nor their employees accept >> any responsibility. >> >> >> >> >> > > >