Gents,
Hive as I see it does not support ORDER BY Column position. It only supports ORDER BY Column name. Thanks Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal Vijayaraghavan Sent: 30 March 2015 05:26 To: user@hive.apache.org Cc: Lefty Leverenz Subject: Re: ORDER BY clause in Hive Hi Lefty, Couldn't find the documentation for what hive.groupby.orderby.position.alias=true does. I suspect that might be what Mich was looking for (though I tend to write the column names explicitly). Cheers, Gopal From: Lefty Leverenz <leftylever...@gmail.com> Reply-To: "user@hive.apache.org" <user@hive.apache.org> Date: Sunday, March 29, 2015 at 8:32 PM To: "user@hive.apache.org" <user@hive.apache.org> Subject: Re: ORDER BY clause in Hive I added information about this in the Order By <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang uageManualSortBy-SyntaxofOrderBy> section of the wiki. Thanks, Mich and Gopal! -- Lefty On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <m...@peridale.co.uk> wrote: Hi Lefty, Gopal, It appears that ORDER BY 3 is not interpreted as ORDERR BY "the result set column three" which standard SQL as evident from Oracle and Sybase does. So I made it an ORDER BY from the result set EXPLICITELY as shown below and it worked OK SELECT rs.Customer_ID , rs.Number_of_orders , rs.Total_customer_amount , rs.Average_order , rs.Standard_deviation FROM ( SELECT cust_id AS Customer_ID, COUNT(amount_sold) AS Number_of_orders, SUM(amount_sold) AS Total_customer_amount, AVG(amount_sold) AS Average_order, STDDEV(amount_sold) AS Standard_deviation FROM sales GROUP BY cust_id HAVING SUM(amount_sold) > 94000 AND AVG(amount_sold) < STDDEV(amount_sold) ) rs ORDER BY rs.Total_customer_amount ; +-----------------+----------------------+---------------------------+------ ---------------+------------------------+--+ | rs.customer_id | rs.number_of_orders | rs.total_customer_amount | rs.average_order | rs.standard_deviation | +-----------------+----------------------+---------------------------+------ ---------------+------------------------+--+ | 1743.0 | 238 | 94786.12999999993 | 398.2610504201678 | 581.0439095219863 | | 429.0 | 231 | 94819.41000000006 | 410.4736363636366 | 613.7057080691426 | | 2994.0 | 227 | 94862.61000000006 | 417.89696035242315 | 623.1607772763742 | | 6395.0 | 268 | 97010.47999999998 | 361.97940298507456 | 576.9120977984521 | | 12783.0 | 240 | 97573.54999999996 | 406.5564583333332 | 590.4445500393804 | | 4974.0 | 235 | 98006.16000000002 | 417.0474893617022 | 624.337482834059 | | 42167.0 | 266 | 98585.96000000002 | 370.6239097744362 | 590.965120684093 | | 10747.0 | 256 | 99578.08999999997 | 388.9769140624999 | 600.7615005975689 | | 11407.0 | 248 | 103412.65999999995 | 416.9865322580643 | 622.221465710723 | +-----------------+----------------------+---------------------------+------ ---------------+------------------------+--+ 9 rows selected (209.699 seconds) Regards, Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: 29 March 2015 00:11 To: user@hive.apache.org Subject: ORDER BY clause in Hive Hi, Can someone point me to doc or otherwise to see if ORDER BY clause in Hive is working OK I have a simple aggregate query as follows: SELECT cust_id AS Customer_ID, COUNT(amount_sold) AS Number_of_orders, SUM(amount_sold) AS Total_customer_amount, AVG(amount_sold) AS Average_order, STDDEV(amount_sold) AS Standard_deviation FROM sales GROUP BY cust_id HAVING SUM(amount_sold) > 94000 AND AVG(amount_sold) < STDDEV(amount_sold) ORDER BY 3 ; The original table and data are from Oracle sh.sales table Oracle comes back for this query with CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER STANDARD_DEVIATION ----------- ---------------- --------------------- ------------- ------------------ 1743 238 94786.13 398.26105 582.26845 429 231 94819.41 410.473636 615.038404 2994 227 94862.61 417.89696 624.53793 6395 268 97010.48 361.979403 577.991448 12783 240 97573.55 406.556458 591.6785 4974 235 98006.16 417.047489 625.670115 42167 266 98585.96 370.62391 592.079099 10747 256 99578.09 388.976914 601.938312 11407 248 103412.66 416.986532 623.479751 9 rows selected. Ordered by TOTAL_CUSTOMER_AMOUNT And hive returns for the same query +--------------+-------------------+------------------------+--------------- ------+---------------------+--+ | customer_id | number_of_orders | total_customer_amount | average_order | standard_deviation | +--------------+-------------------+------------------------+--------------- ------+---------------------+--+ | 42167.0 | 266 | 98585.96000000002 | 370.6239097744362 | 590.965120684093 | | 12783.0 | 240 | 97573.54999999996 | 406.5564583333332 | 590.4445500393804 | | 11407.0 | 248 | 103412.65999999995 | 416.9865322580643 | 622.221465710723 | | 10747.0 | 256 | 99578.08999999997 | 388.9769140624999 | 600.7615005975689 | | 6395.0 | 268 | 97010.47999999998 | 361.97940298507456 | 576.9120977984521 | | 4974.0 | 235 | 98006.16000000002 | 417.0474893617022 | 624.337482834059 | | 2994.0 | 227 | 94862.61000000006 | 417.89696035242315 | 623.1607772763742 | | 1743.0 | 238 | 94786.12999999993 | 398.2610504201678 | 581.0439095219863 | | 429.0 | 231 | 94819.41000000006 | 410.4736363636366 | 613.7057080691426 | +--------------+-------------------+------------------------+--------------- ------+---------------------+--+ 9 rows selected (215.774 seconds) But ordering in Hive does not seem to be correct! Please note ordering is on column three, total_customer_amount I also tried this in Sybase and got the same as Oracle. Adaptive Server cpu time: 100 ms. Customer_ID Number_of_orders Total_customer_amount Average_order Standard_deviation -------------------- ---------------- ----------------------------------------- ---------------------------------------------------- --------------------------- 1743 238 94786.13 398.2610504201680 582.268450 429 231 94819.41 410.4736363636363 615.038404 2994 227 94862.61 417.8969603524229 624.537930 6395 268 97010.48 361.9794029850746 577.991448 12783 240 97573.55 406.5564583333333 591.678500 4974 235 98006.16 417.0474893617021 625.670115 42167 266 98585.96 370.6239097744360 592.079099 10747 256 99578.09 388.9769140625000 601.938312 11407 248 103412.66 416.9865322580645 623.479751 I tried Google search and seems to be different suggestions. May be I have to rewrite the code? Thanks Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.