Yup this works. Thanks for the help.


*Raihan Jamal*



On Tue, Jul 10, 2012 at 4:37 PM, Vijay <tec...@gmail.com> wrote:

> In that case, wouldn't this work:
>
> SELECT buyer_id, item_id, rank(buyer_id), created_time
> FROM (
>     SELECT buyer_id, item_id, created_time
>     FROM testingtable1
>     DISTRIBUTE BY buyer_id
>     SORT BY buyer_id, created_time desc
> ) a
> WHERE rank(buyer_id) < 10;
>
>
> On Tue, Jul 10, 2012 at 4:21 PM, Raihan Jamal <jamalrai...@gmail.com>
> wrote:
> > So the OUTPUT should be like this everything sorted in descending order
> with
> > time and only top 10 for each BUYER_ID-
> >
> > Sample Output.
> >
> > BUYER_ID    |    ITEM_ID       |    CREATED_TIME
> >
> >
> > ------------+------------------+-----------------------
> >
> >
> >
> > 34512201        2412180494          2012-07-09 22:21:11
> >
> >
> > 34512201        1875744030          2012-07-09 19:40:39
> >
> >
> > 34512201        597245693           2012-07-09 16:20:21
> >
> >
> > 34512201        8071787728          2012-07-09 15:19:59
> >
> >
> > 34512201        959188449           2012-07-09 08:48:47
> >
> >
> > 34512201        5868222883          2012-07-09 08:23:17
> >
> >
> > 34512201        5656232360          2012-07-09 07:28:47
> >
> >
> > 34512201        5639158173          2012-07-09 06:58:47
> >
> >
> > 34512201        4645350592          2012-07-09 06:38:47
> >
> >
> > 34512201        2422054205          2012-07-09 06:09:56
> >
> >
> > 1015826235      210002448035        2012-07-09 22:21:11
> >
> >
> > 1015826235      260003553382        2012-07-09 19:40:39
> >
> >
> > 1015826235      220003038067        2012-07-09 19:40:21
> >
> >
> > 1015826235      300003861266        2012-07-09 18:19:59
> >
> >
> > 1015826235      140002997245        2012-07-09 09:23:17
> >
> >
> > 1015826235      260003553385        2012-07-09 08:48:47
> >
> >
> > 1015826235      260003553384        2012-07-09 07:28:47
> >
> >
> > 1015826235      260003553381        2012-07-09 07:09:56
> >
> >
> > 1015826235      260003553383        2012-07-09 06:58:47
> >
> >
> > 1015826235      260003553389        2012-07-09 06:54:37
> >
> >
> >
> >
> >
> > Raihan Jamal
> >
> >
> >
> > On Tue, Jul 10, 2012 at 4:18 PM, Raihan Jamal <jamalrai...@gmail.com>
> wrote:
> >>
> >> Thanks Vijay for reply. But it doesn't works out the way I needed. I am
> >> getting full data back for each BUYER_ID. Let me explain you more.
> >>
> >> This is the below data in the table and you can see in the below data
> this
> >> BUYER_ID - 1015826235 appears 13 times and this BUYER_ID -34512201
> appears
> >> 12 times, And I need TOP 10 for each of the BUYER_ID BASIS on time-
> >>
> >> BUYER_ID    |    ITEM_ID       |    CREATED_TIME
> >>
> >>
> >>
> >> ------------+------------------+-----------------------
> >>
> >>
> >>
> >> 1015826235      220003038067        2012-07-09 19:40:21,
> >>
> >>
> >>
> >> 1015826235      300003861266        2012-07-09 18:19:59,
> >>
> >>
> >>
> >> 1015826235      140002997245        2012-07-09 09:23:17,
> >>
> >>
> >>
> >> 1015826235      210002448035        2012-07-09 22:21:11,
> >>
> >>
> >>
> >> 1015826235      260003553381        2012-07-09 07:09:56,
> >>
> >>
> >>
> >> 1015826235      260003553382        2012-07-09 19:40:39,
> >>
> >>
> >>
> >> 1015826235      260003553383        2012-07-09 06:58:47,
> >>
> >>
> >>
> >> 1015826235      260003553384        2012-07-09 07:28:47,
> >>
> >>
> >>
> >> 1015826235      260003553385        2012-07-09 08:48:47,
> >>
> >>
> >>
> >> 1015826235      260003553386        2012-07-09 06:38:47,
> >>
> >>
> >>
> >> 1015826235      260003553387        2012-07-09 05:38:47,
> >>
> >>
> >>
> >> 1015826235      260003553388        2012-07-09 04:55:47,
> >>
> >>
> >>
> >> 1015826235      260003553389        2012-07-09 06:54:37,
> >>
> >>
> >>
> >> 34512201        597245693           2012-07-09 16:20:21,
> >>
> >>
> >>
> >> 34512201        8071787728          2012-07-09 15:19:59,
> >>
> >>
> >>
> >> 34512201        5868222883          2012-07-09 08:23:17,
> >>
> >>
> >>
> >> 34512201        2412180494          2012-07-09 22:21:11,
> >>
> >>
> >>
> >> 34512201        2422054205          2012-07-09 06:09:56,
> >>
> >>
> >>
> >> 34512201        1875744030          2012-07-09 19:40:39,
> >>
> >>
> >>
> >> 34512201        5639158173          2012-07-09 06:58:47,
> >>
> >>
> >>
> >> 34512201        5656232360          2012-07-09 07:28:47,
> >>
> >>
> >>
> >> 34512201        959188449           2012-07-09 08:48:47,
> >>
> >>
> >>
> >> 34512201        4645350592          2012-07-09 06:38:47,
> >>
> >>
> >>
> >> 34512201        5657320532          2012-07-09 05:38:47,
> >>
> >>
> >>
> >> 34512201        290419656539        2012-07-09 04:55:47,
> >>
> >>
> >> So basically if I do it like this-
> >>
> >> SELECT * FROM TestingTable1 ORDER BY buyer_id, created_time DESC;
> >>
> >> everything will get sorted in descending order for each BUYER_ID and
> then
> >> I need to pick up TOP 10 rows for each BUYER_ID.
> >>
> >> And with your query that you just said, it's not working. I am getting
> >> full data back like this. Below is the result I got from your query-
> >>
> >> 34512201        597245693       0       2012-07-09 16:20:21
> >> 34512201        959188449       0       2012-07-09 08:48:47
> >> 34512201        1875744030      0       2012-07-09 19:40:39
> >> 34512201        2412180494      0       2012-07-09 22:21:11
> >> 34512201        2422054205      0       2012-07-09 06:09:56
> >> 34512201        4645350592      0       2012-07-09 06:38:47
> >> 34512201        5639158173      0       2012-07-09 06:58:47
> >> 34512201        5656232360      0       2012-07-09 07:28:47
> >> 34512201        5657320532      0       2012-07-09 05:38:47
> >> 34512201        5868222883      0       2012-07-09 08:23:17
> >> 34512201        8071787728      0       2012-07-09 15:19:59
> >> 34512201        290419656539    0       2012-07-09 04:55:47
> >> 1015826235      140002997245    0       2012-07-09 09:23:17
> >> 1015826235      210002448035    0       2012-07-09 22:21:11
> >> 1015826235      220003038067    0       2012-07-09 19:40:21
> >> 1015826235      260003553381    0       2012-07-09 07:09:56
> >> 1015826235      260003553382    0       2012-07-09 19:40:39
> >> 1015826235      260003553383    0       2012-07-09 06:58:47
> >> 1015826235      260003553384    0       2012-07-09 07:28:47
> >> 1015826235      260003553385    0       2012-07-09 08:48:47
> >> 1015826235      260003553386    0       2012-07-09 06:38:47
> >> 1015826235      260003553387    0       2012-07-09 05:38:47
> >> 1015826235      260003553388    0       2012-07-09 04:55:47
> >> 1015826235      260003553389    0       2012-07-09 06:54:37
> >> 1015826235      300003861266    0       2012-07-09 18:19:59
> >>
> >>
> >> This is my Rank function that I have created-
> >>
> >> package com.example.hive.udf;
> >> import org.apache.hadoop.hive.ql.exec.UDF;
> >>
> >> public final class Rank extends UDF{
> >>     private int  counter;
> >>     private String last_key;
> >>     public int evaluate(final String key){
> >>  if ( !key.equalsIgnoreCase(this.last_key) ) {
> >>     this.counter = 0;
> >>     this.last_key = key;
> >>  }
> >>  return this.counter++;
> >>     }
> >> }
> >>
> >>
> >> Raihan Jamal
> >>
> >>
> >>
> >> On Tue, Jul 10, 2012 at 4:01 PM, Vijay <tec...@gmail.com> wrote:
> >>>
> >>> This is a little tricky but this is how it works:
> >>>
> >>> SELECT buyer_id, item_id, rank(item_id), created_time
> >>> FROM (
> >>>     SELECT buyer_id, item_id, created_time
> >>>     FROM testingtable1
> >>>     DISTRIBUTE BY buyer_id, item_id
> >>>     SORT BY buyer_id, item_id, created_time desc
> >>> ) a
> >>> WHERE rank(item_id) < 10;
> >>>
> >>>
> >>> On Tue, Jul 10, 2012 at 12:31 PM, Raihan Jamal <jamalrai...@gmail.com>
> >>> wrote:
> >>> > Still it's not working with the use of my rank UDF. Below is the
> query
> >>> > I am
> >>> > using
> >>> >
> >>> > Can anyone help me, what changes I need to make in my below sql
> query?
> >>> >
> >>> >
> >>> > CREATE TABLE IF NOT EXISTS TestingTable1
> >>> >
> >>> >
> >>> > (
> >>> >
> >>> >
> >>> > BUYER_ID BIGINT,
> >>> >
> >>> >
> >>> > ITEM_ID BIGINT,
> >>> >
> >>> >
> >>> > CREATED_TIME STRING
> >>> > )
> >>> >
> >>> >
> >>> > Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for
> >>> > yesterday's date by sorting the created_time in descending order.
> >>> >
> >>> >
> >>> > So what I was thinking is that, with the use of this below query, I
> >>> > will be
> >>> > getting-
> >>> >
> >>> >
> >>> > SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time
> DESC;
> >>> >
> >>> >
> >>> > All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in
> >>> > Descending
> >>> > order, But how I can pick TOP 10 for each BUYER_ID.?
> >>> >
> >>> > This is my SQL Query that I am using with the use of rank UDF.
> >>> >
> >>> > SELECT buyer_id, item_id, created_time, rk
> >>> > FROM (
> >>> >     SELECT buyer_id, item_id, rank(item_id) as rk, created_time
> >>> >     FROM testingtable1
> >>> >     DISTRIBUTE BY buyer_id, item_id
> >>> >     SORT BY buyer_id, item_id, created_time desc
> >>> > ) a
> >>> > WHERE rk < 10
> >>> > ORDER BY buyer_id, created_time, rk;
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > Raihan Jamal
> >>> >
> >>> >
> >>> >
> >>> > On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst
> >>> > <jasper.knu...@incentro.com>
> >>> > wrote:
> >>> >>
> >>> >> Hi Raihan,
> >>> >>
> >>> >> You should use 'rank(buyer_id)' in the order by clause on line 9 in
> >>> >> stead
> >>> >> of the alias 'rk'. I had the same problem, strangely, the alias is
> not
> >>> >> resolved when it is in the order by clause.
> >>> >>
> >>> >> Other thing, I had some issues when I used this exact same set up
> for
> >>> >> ranking results, that the rank UDF was already implemented at the
> map
> >>> >> phase.
> >>> >> Then you get very strange results. You have to introduce an extra
> >>> >> subquery
> >>> >> for it to work.
> >>> >>
> >>> >> Jasper
> >>> >>
> >>> >> Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <jamalrai...@gmail.com
> >
> >>> >> het
> >>> >> volgende:
> >>> >>
> >>> >>> I was not able to understand, This is my below qiuery that I am
> using
> >>> >>> currently-
> >>> >>>
> >>> >>> SELECT buyer_id, item_id, created_time
> >>> >>> FROM (
> >>> >>>     SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
> >>> >>>     FROM testingtable1
> >>> >>>     DISTRIBUTE BY buyer_id, item_id
> >>> >>>     SORT BY buyer_id, item_id, created_time desc
> >>> >>> ) a
> >>> >>> WHERE rank < 10
> >>> >>> ORDER BY buyer_id, created_time, rank;
> >>> >>>
> >>> >>>
> >>> >>> What changes I need to make?
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> Raihan Jamal
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar
> >>> >>> <nitinpawar...@gmail.com>
> >>> >>> wrote:
> >>> >>>>
> >>> >>>> try rk in upper select statement as well
> >>> >>>>
> >>> >>>>
> >>> >>>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal
> >>> >>>> <jamalrai...@gmail.com>
> >>> >>>> wrote:
> >>> >>>>>
> >>> >>>>> Thanks for commenting out. Yes I figured that out, its a UDF. So
> >>> >>>>> now I
> >>> >>>>> have created a new UDF Rank and added to classpath also. But
> when I
> >>> >>>>> am again
> >>> >>>>> running the below query-
> >>> >>>>>
> >>> >>>>> SELECT buyer_id, item_id, created_time
> >>> >>>>> FROM (
> >>> >>>>>     SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time
> >>> >>>>>     FROM testingtable1
> >>> >>>>>     DISTRIBUTE BY buyer_id, item_id
> >>> >>>>>     SORT BY buyer_id, item_id, created_time desc
> >>> >>>>> ) a
> >>> >>>>> WHERE rk < 10
> >>> >>>>> ORDER BY buyer_id, rk;
> >>> >>>>>
> >>> >>>>> I am getting this below error-
> >>> >>>>>
> >>> >>>>> FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias
> >>> >>>>> or
> >>> >>>>> Column Reference rk
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> Why is it so? Any suggestions?
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> Raihan Jamal
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <tec...@gmail.com> wrote:
> >>> >>>>>>
> >>> >>>>>> hive has no built-in rank function. you'd need to use a
> >>> >>>>>> user-defined
> >>> >>>>>> function (UDF) to simulate it. there are a few custom
> >>> >>>>>> implementations
> >>> >>>>>> on the net that you can leverage.
> >>> >>>>>>
> >>> >>>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal
> >>> >>>>>> <jamalrai...@gmail.com>
> >>> >>>>>> wrote:
> >>> >>>>>> > What's wrong with the below query.
> >>> >>>>>> >
> >>> >>>>>> >
> >>> >>>>>> > SELECT buyer_id, item_id, created_time
> >>> >>>>>> > FROM (
> >>> >>>>>> >     SELECT buyer_id, item_id, rank(buyer_id) as rank,
> >>> >>>>>> > created_time
> >>> >>>>>> >     FROM testingtable1
> >>> >>>>>> >     DISTRIBUTE BY buyer_id, item_id
> >>> >>>>>> >     SORT BY buyer_id, item_id, created_time desc
> >>> >>>>>> > ) a
> >>> >>>>>> > WHERE rank < 10
> >>> >>>>>> > ORDER BY buyer_id, rank
> >>> >>>>>> >
> >>> >>>>>> >
> >>> >>>>>> > I am always getting as -
> >>> >>>>>> >
> >>> >>>>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function
> >>> >>>>>> > rank
> >>> >>>>>> >
> >>> >>>>>> >
> >>> >>>>>> > I am using Hive 0.6 I guess.
> >>> >>>>>> >
> >>> >>>>>> >
> >>> >>>>>> >
> >>> >>>>>> > Raihan Jamal
> >>> >>>>>> >
> >>> >>>>>
> >>> >>>>>
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> --
> >>> >>>> Nitin Pawar
> >>> >>>>
> >>> >>>
> >>> >
> >>
> >>
> >
>

Reply via email to