i thought you managed to solve this with rank?? On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal <jamalrai...@gmail.com>wrote:
> Problem with that approach is, with LIMIT 10, If I am putting after desc, > then it will get only 10 rows irrespective of BUYER_ID. But I need > specifically for each BUYER_ID 10 latest rows. > > > > > *Raihan Jamal* > > > > On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari < > abhishektiwari.bt...@gmail.com> wrote: > >> Raihan, >> >> Andes suggests you use 'limit' after 'desc' ie Hive with sort your query >> results in descending order and then chop off any more than first 10 >> records. >> However, Hive will still run the full scan on data since it has to find >> the sorted list of records for you. >> >> >> On Tue, Jul 10, 2012 at 8:36 AM, Raihan Jamal <jamalrai...@gmail.com>wrote: >> >>> This is my below requirement. I need- *Find `TOP 10` data for each >>> `BUYER_ID, *So I cannot use LIMIT 10 here in this case. >>> >>> This is the below table >>> >>> CREATE TABLE IF NOT EXISTS TestingTable1 >>> ( >>> BUYER_ID BIGINT, >>> ITEM_ID BIGINT, >>> CREATED_TIME STRING >>> ) >>> >>> And this is the below data in the above table- >>> >>> BUYER_ID | ITEM_ID | CREATED_TIME >>> ------------+------------------+----------------------- >>> 1015826235 220003038067 2012-07-09 19:40:21, >>> 1015826235 300003861266 2012-07-09 18:19:59, >>> 1015826235 140002997245 2012-07-09 09:23:17, >>> 1015826235 210002448035 2012-07-09 22:21:11, >>> 1015826235 260003553381 2012-07-09 07:09:56, >>> 1015826235 260003553382 2012-07-09 19:40:39, >>> 1015826235 260003553383 2012-07-09 06:58:47, >>> 1015826235 260003553384 2012-07-09 07:28:47, >>> 1015826235 260003553385 2012-07-09 08:48:47, >>> 1015826235 260003553386 2012-07-09 06:38:47, >>> 1015826235 260003553387 2012-07-09 05:38:47, >>> 1015826235 260003553388 2012-07-09 04:55:47, >>> 1015826235 260003553389 2012-07-09 06:54:37, >>> 34512201 597245693 2012-07-09 16:20:21, >>> 34512201 8071787728 2012-07-09 15:19:59, >>> 34512201 5868222883 2012-07-09 08:23:17, >>> 34512201 2412180494 2012-07-09 22:21:11, >>> 34512201 2422054205 2012-07-09 06:09:56, >>> 34512201 1875744030 2012-07-09 19:40:39, >>> 34512201 5639158173 2012-07-09 06:58:47, >>> 34512201 5656232360 2012-07-09 07:28:47, >>> 34512201 959188449 2012-07-09 08:48:47, >>> 34512201 4645350592 2012-07-09 06:38:47, >>> 34512201 5657320532 2012-07-09 05:38:47, >>> 34512201 290419656539 2012-07-09 04:55:47, >>> >>> If you see the above data in the table, there are only two UNIQUE >>> `BUYER_ID` and corresponding to those I have `ITEM_ID` AND `CREATED_TIME`. >>> I need only 10 latest record basis on the time for that given day for each >>> `BUYER_ID`. >>> >>> So for this `BUYER_ID` - `34512201` I need 10 latest record basis on >>> `CREATED_TIME` for that given day only, it means for today's date I need 10 >>> latest record for each `BUYER_ID`. >>> >>> And each `BUYER_ID` can have any day's data. But I am specifically >>> interested for day before today's data(means yesterday's date always) by >>> checking at the `CREATED_TIME` >>> >>> **Find `TOP 10` data for each `BUYER_ID`. Below is the sample output.** >>> >>> Sample Output. >>> >>> BUYER_ID | ITEM_ID | CREATED_TIME >>> ------------+------------------+----------------------- >>> 34512201 2412180494 2012-07-09 22:21:11 >>> 34512201 1875744030 2012-07-09 19:40:39 >>> 34512201 597245693 2012-07-09 16:20:21 >>> 34512201 8071787728 2012-07-09 15:19:59 >>> 34512201 959188449 2012-07-09 08:48:47 >>> 34512201 5868222883 2012-07-09 08:23:17 >>> 34512201 5656232360 2012-07-09 07:28:47 >>> 34512201 5639158173 2012-07-09 06:58:47 >>> 34512201 4645350592 2012-07-09 06:38:47 >>> 34512201 2422054205 2012-07-09 06:09:56 >>> 1015826235 210002448035 2012-07-09 22:21:11 >>> 1015826235 260003553382 2012-07-09 19:40:39 >>> 1015826235 220003038067 2012-07-09 19:40:21 >>> 1015826235 300003861266 2012-07-09 18:19:59 >>> 1015826235 140002997245 2012-07-09 09:23:17 >>> 1015826235 260003553385 2012-07-09 08:48:47 >>> 1015826235 260003553384 2012-07-09 07:28:47 >>> 1015826235 260003553381 2012-07-09 07:09:56 >>> 1015826235 260003553383 2012-07-09 06:58:47 >>> 1015826235 260003553389 2012-07-09 06:54:37 >>> >>> >>> *Raihan Jamal* >>> >>> >>> >>> On Mon, Jul 9, 2012 at 7:56 PM, Andes <ylyy-1...@163.com> wrote: >>> >>>> ** >>>> hello, you can use "desc" and "limit 10" to filter the top 10. >>>> >>>> 2012-07-10 >>>> ------------------------------ >>>> ** >>>> Best Regards >>>> Andes >>>> >>>> ** >>>> ------------------------------ >>>> *发件人:*Raihan Jamal >>>> *发送时间:*2012-07-10 10:31 >>>> *主题:*Find TOP 10 using HiveQL >>>> *收件人:*"user"<user@hive.apache.org> >>>> *抄送:* >>>> >>>> When I run this query, >>>> >>>> SELECT TOP 10 FROM TestingTable1 WHERE ORDER BY buyer_id, created_time >>>> DESC; >>>> >>>> >>>> I always get error as- >>>> >>>> *FAILED: Parse Error: line 1:7 cannot recognize input 'TOP' in select >>>> expression* >>>> >>>> Is there any way around to use TOP 10 or something similar that will >>>> work in HiveQL? >>>> >>>> >>>> >>>> *Raihan Jamal* >>>> >>>> >>> >> >> >> >> >> > -- Nitin Pawar