You could use TRANSFORM with a simple awk script:

TRANSFORM(a, b, c, d)
  USING "/usr/bin/awk '
  {if($1!=c){c=$1; a=0}; if(a<20){print $0; a++}}'"


This will create a top 20 for each group.  

--Wouter de Bie
Team Lead Analytics Infrastructure, Spotify
wou...@spotify.com (mailto:wou...@spotify.com)
+46 72 018 0777


On Tuesday, July 10, 2012 at 9:15 AM, Raihan Jamal wrote:

> I am trying that solution. Currently I am running my query to see what result 
> I am getting back with UDF.
>  
>  
>  
>  
> Raihan Jamal  
>  
>  
> On Tue, Jul 10, 2012 at 12:13 AM, Nitin Pawar <nitinpawar...@gmail.com 
> (mailto:nitinpawar...@gmail.com)> wrote:
> > i thought you managed to solve this with rank??
> >  
> >  
> > On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal <jamalrai...@gmail.com 
> > (mailto:jamalrai...@gmail.com)> wrote:
> > > Problem with that approach is, with LIMIT 10, If I am putting after desc, 
> > > then it will get only 10 rows irrespective of BUYER_ID. But I need 
> > > specifically for each BUYER_ID 10 latest rows.
> > >  
> > >  
> > >  
> > >  
> > > Raihan Jamal
> > >  
> > >  
> > >  
> > > On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari 
> > > <abhishektiwari.bt...@gmail.com (mailto:abhishektiwari.bt...@gmail.com)> 
> > > wrote:
> > > > Raihan,
> > > >  
> > > > Andes suggests you use 'limit' after 'desc' ie Hive with sort your 
> > > > query results in descending order and then chop off any more than first 
> > > > 10 records.  
> > > > However, Hive will still run the full scan on data since it has to find 
> > > > the sorted list of records for you.   
> > > >  
> > > >  
> > > > On Tue, Jul 10, 2012 at 8:36 AM, Raihan Jamal <jamalrai...@gmail.com 
> > > > (mailto:jamalrai...@gmail.com)> wrote:
> > > > > This is my below requirement. I need-  Find `TOP 10` data for each 
> > > > > `BUYER_ID, So I cannot use LIMIT 10 here in this case.
> > > > >  
> > > > > This is the below table
> > > > >  
> > > > >     CREATE TABLE IF NOT EXISTS TestingTable1  
> > > > >     (  
> > > > >     BUYER_ID BIGINT,
> > > > >     ITEM_ID BIGINT,  
> > > > >     CREATED_TIME STRING
> > > > >     )
> > > > >  
> > > > > And this is the below data in the above table-  
> > > > >  
> > > > >     BUYER_ID    |    ITEM_ID   | CREATED_TIME
> > > > >     ------------+------------------+-----------------------
> > > > >     1015826235      220003038067        2012-07-09 19:40:21,
> > > > >     1015826235      300003861266        2012-07-09 18:19:59,
> > > > >     1015826235      140002997245        2012-07-09 09:23:17,
> > > > >     1015826235      210002448035        2012-07-09 22:21:11,
> > > > >     1015826235      260003553381        2012-07-09 07:09:56,
> > > > >     1015826235      260003553382        2012-07-09 19:40:39,
> > > > >     1015826235      260003553383        2012-07-09 06:58:47,
> > > > >     1015826235      260003553384        2012-07-09 07:28:47,
> > > > >     1015826235      260003553385        2012-07-09 08:48:47,
> > > > >     1015826235      260003553386        2012-07-09 06:38:47,
> > > > >     1015826235      260003553387        2012-07-09 05:38:47,
> > > > >     1015826235      260003553388        2012-07-09 04:55:47,
> > > > >     1015826235      260003553389        2012-07-09 06:54:37,
> > > > >     34512201        597245693           2012-07-09 16:20:21,
> > > > >     34512201        8071787728          2012-07-09 15:19:59,
> > > > >     34512201        5868222883 (tel:5868222883)          2012-07-09 
> > > > > 08:23:17,
> > > > >     34512201        2412180494          2012-07-09 22:21:11,
> > > > >     34512201        2422054205          2012-07-09 06:09:56,
> > > > >     34512201        1875744030          2012-07-09 19:40:39,
> > > > >     34512201        5639158173 (tel:5639158173)          2012-07-09 
> > > > > 06:58:47,
> > > > >     34512201        5656232360          2012-07-09 07:28:47,  
> > > > >     34512201        959188449 2012-07-09 08:48:47,
> > > > >     34512201        4645350592          2012-07-09 06:38:47,
> > > > >     34512201        5657320532 2012-07-09 05:38:47,
> > > > >     34512201        290419656539 2012-07-09 04:55:47,
> > > > >  
> > > > > If you see the above data in the table, there are only two UNIQUE 
> > > > > `BUYER_ID` and corresponding to those I have `ITEM_ID` AND 
> > > > > `CREATED_TIME`. I need only 10 latest record basis on the time for 
> > > > > that given day for each `BUYER_ID`.  
> > > > >  
> > > > > So for this `BUYER_ID` - `34512201` I need 10 latest record basis on 
> > > > > `CREATED_TIME` for that given day only, it means for today's date I 
> > > > > need 10 latest record for each `BUYER_ID`.
> > > > >  
> > > > > And each `BUYER_ID` can have any day's data. But I am specifically 
> > > > > interested for day before  today's data(means yesterday's date 
> > > > > always) by checking at the `CREATED_TIME`
> > > > >  
> > > > > **Find `TOP 10` data for each `BUYER_ID`. Below is the sample 
> > > > > output.**  
> > > > >  
> > > > > Sample Output.
> > > > >  
> > > > >     BUYER_ID    |    ITEM_ID   | CREATED_TIME  
> > > > >     ------------+------------------+-----------------------
> > > > >     34512201        2412180494          2012-07-09 22:21:11
> > > > >     34512201        1875744030          2012-07-09 19:40:39
> > > > >     34512201        597245693           2012-07-09 16:20:21
> > > > >     34512201        8071787728          2012-07-09 15:19:59
> > > > >     34512201        959188449 2012-07-09 08:48:47
> > > > >     34512201        5868222883 (tel:5868222883)          2012-07-09 
> > > > > 08:23:17
> > > > >     34512201        5656232360          2012-07-09 07:28:47  
> > > > >     34512201        5639158173 (tel:5639158173)          2012-07-09 
> > > > > 06:58:47
> > > > >     34512201        4645350592          2012-07-09 06:38:47
> > > > >     34512201        2422054205          2012-07-09 06:09:56
> > > > >     1015826235      210002448035        2012-07-09 22:21:11
> > > > >     1015826235      260003553382        2012-07-09 19:40:39
> > > > >     1015826235      220003038067        2012-07-09 19:40:21
> > > > >     1015826235      300003861266        2012-07-09 18:19:59
> > > > >     1015826235      140002997245        2012-07-09 09:23:17
> > > > >     1015826235      260003553385        2012-07-09 08:48:47
> > > > >     1015826235      260003553384        2012-07-09 07:28:47
> > > > >     1015826235      260003553381        2012-07-09 07:09:56
> > > > >     1015826235      260003553383        2012-07-09 06:58:47
> > > > >     1015826235      260003553389        2012-07-09 06:54:37
> > > > >  
> > > > >  
> > > > > Raihan Jamal
> > > > >  
> > > > >  
> > > > >  
> > > > > On Mon, Jul 9, 2012 at 7:56 PM, Andes <ylyy-1...@163.com 
> > > > > (mailto:ylyy-1...@163.com)> wrote:
> > > > > > hello, you can use "desc" and "limit 10" to filter the top 10.  
> > > > > >   
> > > > > > 2012-07-10
> > > > > >  
> > > > > > Best Regards
> > > > > > Andes
> > > > > >   
> > > > > >  
> > > > > >  
> > > > > > 发件人:Raihan Jamal
> > > > > > 发送时间:2012-07-10 10 (tel:2012-07-10%C2%A010):31
> > > > > > 主题:Find TOP 10 using HiveQL
> > > > > > 收件人:"user"<user@hive.apache.org (mailto:user@hive.apache.org)>
> > > > > > 抄送:
> > > > > >   
> > > > > > When I run this query,
> > > > > >  
> > > > > > SELECT TOP 10 FROM TestingTable1 WHERE ORDER BY buyer_id, 
> > > > > > created_time DESC;
> > > > > >  
> > > > > >  
> > > > > > I always get error as-  
> > > > > >  
> > > > > > FAILED: Parse Error: line 1:7 cannot recognize input 'TOP' in 
> > > > > > select expression  
> > > > > >  
> > > > > > Is there any way around to use TOP 10 or something similar that 
> > > > > > will work in HiveQL?
> > > > > >  
> > > > > >  
> > > > > > Raihan Jamal
> > > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > >  
> >  
> >  
> >  
> > --  
> > Nitin Pawar
> >  
>  

Reply via email to