You could use TRANSFORM with a simple awk script: TRANSFORM(a, b, c, d) USING "/usr/bin/awk ' {if($1!=c){c=$1; a=0}; if(a<20){print $0; a++}}'"
This will create a top 20 for each group. --Wouter de Bie Team Lead Analytics Infrastructure, Spotify wou...@spotify.com (mailto:wou...@spotify.com) +46 72 018 0777 On Tuesday, July 10, 2012 at 9:15 AM, Raihan Jamal wrote: > I am trying that solution. Currently I am running my query to see what result > I am getting back with UDF. > > > > > Raihan Jamal > > > On Tue, Jul 10, 2012 at 12:13 AM, Nitin Pawar <nitinpawar...@gmail.com > (mailto:nitinpawar...@gmail.com)> wrote: > > i thought you managed to solve this with rank?? > > > > > > On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal <jamalrai...@gmail.com > > (mailto:jamalrai...@gmail.com)> wrote: > > > Problem with that approach is, with LIMIT 10, If I am putting after desc, > > > then it will get only 10 rows irrespective of BUYER_ID. But I need > > > specifically for each BUYER_ID 10 latest rows. > > > > > > > > > > > > > > > Raihan Jamal > > > > > > > > > > > > On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari > > > <abhishektiwari.bt...@gmail.com (mailto:abhishektiwari.bt...@gmail.com)> > > > wrote: > > > > Raihan, > > > > > > > > Andes suggests you use 'limit' after 'desc' ie Hive with sort your > > > > query results in descending order and then chop off any more than first > > > > 10 records. > > > > However, Hive will still run the full scan on data since it has to find > > > > the sorted list of records for you. > > > > > > > > > > > > On Tue, Jul 10, 2012 at 8:36 AM, Raihan Jamal <jamalrai...@gmail.com > > > > (mailto:jamalrai...@gmail.com)> wrote: > > > > > This is my below requirement. I need- Find `TOP 10` data for each > > > > > `BUYER_ID, So I cannot use LIMIT 10 here in this case. > > > > > > > > > > This is the below table > > > > > > > > > > CREATE TABLE IF NOT EXISTS TestingTable1 > > > > > ( > > > > > BUYER_ID BIGINT, > > > > > ITEM_ID BIGINT, > > > > > CREATED_TIME STRING > > > > > ) > > > > > > > > > > And this is the below data in the above table- > > > > > > > > > > BUYER_ID | ITEM_ID | CREATED_TIME > > > > > ------------+------------------+----------------------- > > > > > 1015826235 220003038067 2012-07-09 19:40:21, > > > > > 1015826235 300003861266 2012-07-09 18:19:59, > > > > > 1015826235 140002997245 2012-07-09 09:23:17, > > > > > 1015826235 210002448035 2012-07-09 22:21:11, > > > > > 1015826235 260003553381 2012-07-09 07:09:56, > > > > > 1015826235 260003553382 2012-07-09 19:40:39, > > > > > 1015826235 260003553383 2012-07-09 06:58:47, > > > > > 1015826235 260003553384 2012-07-09 07:28:47, > > > > > 1015826235 260003553385 2012-07-09 08:48:47, > > > > > 1015826235 260003553386 2012-07-09 06:38:47, > > > > > 1015826235 260003553387 2012-07-09 05:38:47, > > > > > 1015826235 260003553388 2012-07-09 04:55:47, > > > > > 1015826235 260003553389 2012-07-09 06:54:37, > > > > > 34512201 597245693 2012-07-09 16:20:21, > > > > > 34512201 8071787728 2012-07-09 15:19:59, > > > > > 34512201 5868222883 (tel:5868222883) 2012-07-09 > > > > > 08:23:17, > > > > > 34512201 2412180494 2012-07-09 22:21:11, > > > > > 34512201 2422054205 2012-07-09 06:09:56, > > > > > 34512201 1875744030 2012-07-09 19:40:39, > > > > > 34512201 5639158173 (tel:5639158173) 2012-07-09 > > > > > 06:58:47, > > > > > 34512201 5656232360 2012-07-09 07:28:47, > > > > > 34512201 959188449 2012-07-09 08:48:47, > > > > > 34512201 4645350592 2012-07-09 06:38:47, > > > > > 34512201 5657320532 2012-07-09 05:38:47, > > > > > 34512201 290419656539 2012-07-09 04:55:47, > > > > > > > > > > If you see the above data in the table, there are only two UNIQUE > > > > > `BUYER_ID` and corresponding to those I have `ITEM_ID` AND > > > > > `CREATED_TIME`. I need only 10 latest record basis on the time for > > > > > that given day for each `BUYER_ID`. > > > > > > > > > > So for this `BUYER_ID` - `34512201` I need 10 latest record basis on > > > > > `CREATED_TIME` for that given day only, it means for today's date I > > > > > need 10 latest record for each `BUYER_ID`. > > > > > > > > > > And each `BUYER_ID` can have any day's data. But I am specifically > > > > > interested for day before today's data(means yesterday's date > > > > > always) by checking at the `CREATED_TIME` > > > > > > > > > > **Find `TOP 10` data for each `BUYER_ID`. Below is the sample > > > > > output.** > > > > > > > > > > Sample Output. > > > > > > > > > > BUYER_ID | ITEM_ID | CREATED_TIME > > > > > ------------+------------------+----------------------- > > > > > 34512201 2412180494 2012-07-09 22:21:11 > > > > > 34512201 1875744030 2012-07-09 19:40:39 > > > > > 34512201 597245693 2012-07-09 16:20:21 > > > > > 34512201 8071787728 2012-07-09 15:19:59 > > > > > 34512201 959188449 2012-07-09 08:48:47 > > > > > 34512201 5868222883 (tel:5868222883) 2012-07-09 > > > > > 08:23:17 > > > > > 34512201 5656232360 2012-07-09 07:28:47 > > > > > 34512201 5639158173 (tel:5639158173) 2012-07-09 > > > > > 06:58:47 > > > > > 34512201 4645350592 2012-07-09 06:38:47 > > > > > 34512201 2422054205 2012-07-09 06:09:56 > > > > > 1015826235 210002448035 2012-07-09 22:21:11 > > > > > 1015826235 260003553382 2012-07-09 19:40:39 > > > > > 1015826235 220003038067 2012-07-09 19:40:21 > > > > > 1015826235 300003861266 2012-07-09 18:19:59 > > > > > 1015826235 140002997245 2012-07-09 09:23:17 > > > > > 1015826235 260003553385 2012-07-09 08:48:47 > > > > > 1015826235 260003553384 2012-07-09 07:28:47 > > > > > 1015826235 260003553381 2012-07-09 07:09:56 > > > > > 1015826235 260003553383 2012-07-09 06:58:47 > > > > > 1015826235 260003553389 2012-07-09 06:54:37 > > > > > > > > > > > > > > > Raihan Jamal > > > > > > > > > > > > > > > > > > > > On Mon, Jul 9, 2012 at 7:56 PM, Andes <ylyy-1...@163.com > > > > > (mailto:ylyy-1...@163.com)> wrote: > > > > > > hello, you can use "desc" and "limit 10" to filter the top 10. > > > > > > > > > > > > 2012-07-10 > > > > > > > > > > > > Best Regards > > > > > > Andes > > > > > > > > > > > > > > > > > > > > > > > > 发件人:Raihan Jamal > > > > > > 发送时间:2012-07-10 10 (tel:2012-07-10%C2%A010):31 > > > > > > 主题:Find TOP 10 using HiveQL > > > > > > 收件人:"user"<user@hive.apache.org (mailto:user@hive.apache.org)> > > > > > > 抄送: > > > > > > > > > > > > When I run this query, > > > > > > > > > > > > SELECT TOP 10 FROM TestingTable1 WHERE ORDER BY buyer_id, > > > > > > created_time DESC; > > > > > > > > > > > > > > > > > > I always get error as- > > > > > > > > > > > > FAILED: Parse Error: line 1:7 cannot recognize input 'TOP' in > > > > > > select expression > > > > > > > > > > > > Is there any way around to use TOP 10 or something similar that > > > > > > will work in HiveQL? > > > > > > > > > > > > > > > > > > Raihan Jamal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Nitin Pawar > > >