Yup this works. Thanks for the help.
*Raihan Jamal* On Tue, Jul 10, 2012 at 4:37 PM, Vijay <tec...@gmail.com> wrote: > In that case, wouldn't this work: > > SELECT buyer_id, item_id, rank(buyer_id), created_time > FROM ( > SELECT buyer_id, item_id, created_time > FROM testingtable1 > DISTRIBUTE BY buyer_id > SORT BY buyer_id, created_time desc > ) a > WHERE rank(buyer_id) < 10; > > > On Tue, Jul 10, 2012 at 4:21 PM, Raihan Jamal <jamalrai...@gmail.com> > wrote: > > So the OUTPUT should be like this everything sorted in descending order > with > > time and only top 10 for each BUYER_ID- > > > > Sample Output. > > > > BUYER_ID | ITEM_ID | CREATED_TIME > > > > > > ------------+------------------+----------------------- > > > > > > > > 34512201 2412180494 2012-07-09 22:21:11 > > > > > > 34512201 1875744030 2012-07-09 19:40:39 > > > > > > 34512201 597245693 2012-07-09 16:20:21 > > > > > > 34512201 8071787728 2012-07-09 15:19:59 > > > > > > 34512201 959188449 2012-07-09 08:48:47 > > > > > > 34512201 5868222883 2012-07-09 08:23:17 > > > > > > 34512201 5656232360 2012-07-09 07:28:47 > > > > > > 34512201 5639158173 2012-07-09 06:58:47 > > > > > > 34512201 4645350592 2012-07-09 06:38:47 > > > > > > 34512201 2422054205 2012-07-09 06:09:56 > > > > > > 1015826235 210002448035 2012-07-09 22:21:11 > > > > > > 1015826235 260003553382 2012-07-09 19:40:39 > > > > > > 1015826235 220003038067 2012-07-09 19:40:21 > > > > > > 1015826235 300003861266 2012-07-09 18:19:59 > > > > > > 1015826235 140002997245 2012-07-09 09:23:17 > > > > > > 1015826235 260003553385 2012-07-09 08:48:47 > > > > > > 1015826235 260003553384 2012-07-09 07:28:47 > > > > > > 1015826235 260003553381 2012-07-09 07:09:56 > > > > > > 1015826235 260003553383 2012-07-09 06:58:47 > > > > > > 1015826235 260003553389 2012-07-09 06:54:37 > > > > > > > > > > > > Raihan Jamal > > > > > > > > On Tue, Jul 10, 2012 at 4:18 PM, Raihan Jamal <jamalrai...@gmail.com> > wrote: > >> > >> Thanks Vijay for reply. But it doesn't works out the way I needed. I am > >> getting full data back for each BUYER_ID. Let me explain you more. > >> > >> This is the below data in the table and you can see in the below data > this > >> BUYER_ID - 1015826235 appears 13 times and this BUYER_ID -34512201 > appears > >> 12 times, And I need TOP 10 for each of the BUYER_ID BASIS on time- > >> > >> BUYER_ID | ITEM_ID | CREATED_TIME > >> > >> > >> > >> ------------+------------------+----------------------- > >> > >> > >> > >> 1015826235 220003038067 2012-07-09 19:40:21, > >> > >> > >> > >> 1015826235 300003861266 2012-07-09 18:19:59, > >> > >> > >> > >> 1015826235 140002997245 2012-07-09 09:23:17, > >> > >> > >> > >> 1015826235 210002448035 2012-07-09 22:21:11, > >> > >> > >> > >> 1015826235 260003553381 2012-07-09 07:09:56, > >> > >> > >> > >> 1015826235 260003553382 2012-07-09 19:40:39, > >> > >> > >> > >> 1015826235 260003553383 2012-07-09 06:58:47, > >> > >> > >> > >> 1015826235 260003553384 2012-07-09 07:28:47, > >> > >> > >> > >> 1015826235 260003553385 2012-07-09 08:48:47, > >> > >> > >> > >> 1015826235 260003553386 2012-07-09 06:38:47, > >> > >> > >> > >> 1015826235 260003553387 2012-07-09 05:38:47, > >> > >> > >> > >> 1015826235 260003553388 2012-07-09 04:55:47, > >> > >> > >> > >> 1015826235 260003553389 2012-07-09 06:54:37, > >> > >> > >> > >> 34512201 597245693 2012-07-09 16:20:21, > >> > >> > >> > >> 34512201 8071787728 2012-07-09 15:19:59, > >> > >> > >> > >> 34512201 5868222883 2012-07-09 08:23:17, > >> > >> > >> > >> 34512201 2412180494 2012-07-09 22:21:11, > >> > >> > >> > >> 34512201 2422054205 2012-07-09 06:09:56, > >> > >> > >> > >> 34512201 1875744030 2012-07-09 19:40:39, > >> > >> > >> > >> 34512201 5639158173 2012-07-09 06:58:47, > >> > >> > >> > >> 34512201 5656232360 2012-07-09 07:28:47, > >> > >> > >> > >> 34512201 959188449 2012-07-09 08:48:47, > >> > >> > >> > >> 34512201 4645350592 2012-07-09 06:38:47, > >> > >> > >> > >> 34512201 5657320532 2012-07-09 05:38:47, > >> > >> > >> > >> 34512201 290419656539 2012-07-09 04:55:47, > >> > >> > >> So basically if I do it like this- > >> > >> SELECT * FROM TestingTable1 ORDER BY buyer_id, created_time DESC; > >> > >> everything will get sorted in descending order for each BUYER_ID and > then > >> I need to pick up TOP 10 rows for each BUYER_ID. > >> > >> And with your query that you just said, it's not working. I am getting > >> full data back like this. Below is the result I got from your query- > >> > >> 34512201 597245693 0 2012-07-09 16:20:21 > >> 34512201 959188449 0 2012-07-09 08:48:47 > >> 34512201 1875744030 0 2012-07-09 19:40:39 > >> 34512201 2412180494 0 2012-07-09 22:21:11 > >> 34512201 2422054205 0 2012-07-09 06:09:56 > >> 34512201 4645350592 0 2012-07-09 06:38:47 > >> 34512201 5639158173 0 2012-07-09 06:58:47 > >> 34512201 5656232360 0 2012-07-09 07:28:47 > >> 34512201 5657320532 0 2012-07-09 05:38:47 > >> 34512201 5868222883 0 2012-07-09 08:23:17 > >> 34512201 8071787728 0 2012-07-09 15:19:59 > >> 34512201 290419656539 0 2012-07-09 04:55:47 > >> 1015826235 140002997245 0 2012-07-09 09:23:17 > >> 1015826235 210002448035 0 2012-07-09 22:21:11 > >> 1015826235 220003038067 0 2012-07-09 19:40:21 > >> 1015826235 260003553381 0 2012-07-09 07:09:56 > >> 1015826235 260003553382 0 2012-07-09 19:40:39 > >> 1015826235 260003553383 0 2012-07-09 06:58:47 > >> 1015826235 260003553384 0 2012-07-09 07:28:47 > >> 1015826235 260003553385 0 2012-07-09 08:48:47 > >> 1015826235 260003553386 0 2012-07-09 06:38:47 > >> 1015826235 260003553387 0 2012-07-09 05:38:47 > >> 1015826235 260003553388 0 2012-07-09 04:55:47 > >> 1015826235 260003553389 0 2012-07-09 06:54:37 > >> 1015826235 300003861266 0 2012-07-09 18:19:59 > >> > >> > >> This is my Rank function that I have created- > >> > >> package com.example.hive.udf; > >> import org.apache.hadoop.hive.ql.exec.UDF; > >> > >> public final class Rank extends UDF{ > >> private int counter; > >> private String last_key; > >> public int evaluate(final String key){ > >> if ( !key.equalsIgnoreCase(this.last_key) ) { > >> this.counter = 0; > >> this.last_key = key; > >> } > >> return this.counter++; > >> } > >> } > >> > >> > >> Raihan Jamal > >> > >> > >> > >> On Tue, Jul 10, 2012 at 4:01 PM, Vijay <tec...@gmail.com> wrote: > >>> > >>> This is a little tricky but this is how it works: > >>> > >>> SELECT buyer_id, item_id, rank(item_id), created_time > >>> FROM ( > >>> SELECT buyer_id, item_id, created_time > >>> FROM testingtable1 > >>> DISTRIBUTE BY buyer_id, item_id > >>> SORT BY buyer_id, item_id, created_time desc > >>> ) a > >>> WHERE rank(item_id) < 10; > >>> > >>> > >>> On Tue, Jul 10, 2012 at 12:31 PM, Raihan Jamal <jamalrai...@gmail.com> > >>> wrote: > >>> > Still it's not working with the use of my rank UDF. Below is the > query > >>> > I am > >>> > using > >>> > > >>> > Can anyone help me, what changes I need to make in my below sql > query? > >>> > > >>> > > >>> > CREATE TABLE IF NOT EXISTS TestingTable1 > >>> > > >>> > > >>> > ( > >>> > > >>> > > >>> > BUYER_ID BIGINT, > >>> > > >>> > > >>> > ITEM_ID BIGINT, > >>> > > >>> > > >>> > CREATED_TIME STRING > >>> > ) > >>> > > >>> > > >>> > Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for > >>> > yesterday's date by sorting the created_time in descending order. > >>> > > >>> > > >>> > So what I was thinking is that, with the use of this below query, I > >>> > will be > >>> > getting- > >>> > > >>> > > >>> > SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time > DESC; > >>> > > >>> > > >>> > All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in > >>> > Descending > >>> > order, But how I can pick TOP 10 for each BUYER_ID.? > >>> > > >>> > This is my SQL Query that I am using with the use of rank UDF. > >>> > > >>> > SELECT buyer_id, item_id, created_time, rk > >>> > FROM ( > >>> > SELECT buyer_id, item_id, rank(item_id) as rk, created_time > >>> > FROM testingtable1 > >>> > DISTRIBUTE BY buyer_id, item_id > >>> > SORT BY buyer_id, item_id, created_time desc > >>> > ) a > >>> > WHERE rk < 10 > >>> > ORDER BY buyer_id, created_time, rk; > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > Raihan Jamal > >>> > > >>> > > >>> > > >>> > On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst > >>> > <jasper.knu...@incentro.com> > >>> > wrote: > >>> >> > >>> >> Hi Raihan, > >>> >> > >>> >> You should use 'rank(buyer_id)' in the order by clause on line 9 in > >>> >> stead > >>> >> of the alias 'rk'. I had the same problem, strangely, the alias is > not > >>> >> resolved when it is in the order by clause. > >>> >> > >>> >> Other thing, I had some issues when I used this exact same set up > for > >>> >> ranking results, that the rank UDF was already implemented at the > map > >>> >> phase. > >>> >> Then you get very strange results. You have to introduce an extra > >>> >> subquery > >>> >> for it to work. > >>> >> > >>> >> Jasper > >>> >> > >>> >> Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <jamalrai...@gmail.com > > > >>> >> het > >>> >> volgende: > >>> >> > >>> >>> I was not able to understand, This is my below qiuery that I am > using > >>> >>> currently- > >>> >>> > >>> >>> SELECT buyer_id, item_id, created_time > >>> >>> FROM ( > >>> >>> SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time > >>> >>> FROM testingtable1 > >>> >>> DISTRIBUTE BY buyer_id, item_id > >>> >>> SORT BY buyer_id, item_id, created_time desc > >>> >>> ) a > >>> >>> WHERE rank < 10 > >>> >>> ORDER BY buyer_id, created_time, rank; > >>> >>> > >>> >>> > >>> >>> What changes I need to make? > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> Raihan Jamal > >>> >>> > >>> >>> > >>> >>> > >>> >>> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar > >>> >>> <nitinpawar...@gmail.com> > >>> >>> wrote: > >>> >>>> > >>> >>>> try rk in upper select statement as well > >>> >>>> > >>> >>>> > >>> >>>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal > >>> >>>> <jamalrai...@gmail.com> > >>> >>>> wrote: > >>> >>>>> > >>> >>>>> Thanks for commenting out. Yes I figured that out, its a UDF. So > >>> >>>>> now I > >>> >>>>> have created a new UDF Rank and added to classpath also. But > when I > >>> >>>>> am again > >>> >>>>> running the below query- > >>> >>>>> > >>> >>>>> SELECT buyer_id, item_id, created_time > >>> >>>>> FROM ( > >>> >>>>> SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time > >>> >>>>> FROM testingtable1 > >>> >>>>> DISTRIBUTE BY buyer_id, item_id > >>> >>>>> SORT BY buyer_id, item_id, created_time desc > >>> >>>>> ) a > >>> >>>>> WHERE rk < 10 > >>> >>>>> ORDER BY buyer_id, rk; > >>> >>>>> > >>> >>>>> I am getting this below error- > >>> >>>>> > >>> >>>>> FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias > >>> >>>>> or > >>> >>>>> Column Reference rk > >>> >>>>> > >>> >>>>> > >>> >>>>> Why is it so? Any suggestions? > >>> >>>>> > >>> >>>>> > >>> >>>>> Raihan Jamal > >>> >>>>> > >>> >>>>> > >>> >>>>> > >>> >>>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <tec...@gmail.com> wrote: > >>> >>>>>> > >>> >>>>>> hive has no built-in rank function. you'd need to use a > >>> >>>>>> user-defined > >>> >>>>>> function (UDF) to simulate it. there are a few custom > >>> >>>>>> implementations > >>> >>>>>> on the net that you can leverage. > >>> >>>>>> > >>> >>>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal > >>> >>>>>> <jamalrai...@gmail.com> > >>> >>>>>> wrote: > >>> >>>>>> > What's wrong with the below query. > >>> >>>>>> > > >>> >>>>>> > > >>> >>>>>> > SELECT buyer_id, item_id, created_time > >>> >>>>>> > FROM ( > >>> >>>>>> > SELECT buyer_id, item_id, rank(buyer_id) as rank, > >>> >>>>>> > created_time > >>> >>>>>> > FROM testingtable1 > >>> >>>>>> > DISTRIBUTE BY buyer_id, item_id > >>> >>>>>> > SORT BY buyer_id, item_id, created_time desc > >>> >>>>>> > ) a > >>> >>>>>> > WHERE rank < 10 > >>> >>>>>> > ORDER BY buyer_id, rank > >>> >>>>>> > > >>> >>>>>> > > >>> >>>>>> > I am always getting as - > >>> >>>>>> > > >>> >>>>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function > >>> >>>>>> > rank > >>> >>>>>> > > >>> >>>>>> > > >>> >>>>>> > I am using Hive 0.6 I guess. > >>> >>>>>> > > >>> >>>>>> > > >>> >>>>>> > > >>> >>>>>> > Raihan Jamal > >>> >>>>>> > > >>> >>>>> > >>> >>>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> -- > >>> >>>> Nitin Pawar > >>> >>>> > >>> >>> > >>> > > >> > >> > > >