Actually, never mind. Looks like you need to partition by both bid and pid.
In that case, your problem is that rank() has to handle a combined bid+pid
key. So first you need to create a combined key, partition by that key and
pass it to your rank() function (assuming rank() knows to reset on a new
key). You can cast bid and pid to string and concatenate them with a
separator (bid_pid) to get a single partitioning key. Hope this makes sense.

On Thu, Jul 19, 2012 at 3:57 PM, Igor Tatarinov <i...@decide.com> wrote:

> Sorry, just pid needs to be dropped from both DISTRIBUTE and SORT clauses.
> Your very first query was correct except for the nested subquery part.
> (You don't need a double-nested subquery.)
>
> On Thu, Jul 19, 2012 at 3:48 PM, comptech geeky 
> <comptechge...@gmail.com>wrote:
>
>> Hi Igor,
>>
>> I am not sure what I have to remove from Distribute By as in distribute
>> by we have bid, pid and you said remove bid and time from distribute by and
>> it doesn't have time
>>
>> *SELECT bid, pid, rank FROM *
>>       *(SELECT bid, pid, rank(bid) rank, time, UNIX_TIMESTAMP(time)
>> FROM
>> *
>>   *
>>            ( SELECT bid, pid, time FROM table1*
>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) =
>> '2012-07-09' *
>> *            DISTRIBUTE BY bid,pid*
>> *            SORT BY bid,pid, time desc) A
>> *
>> *
>>       ) B
>> *
>> *WHERE rank < 3;*
>>
>>
>> And also I tried running the above query as it is. I am not getting
>> expected output instead of that I am getting output like this which is
>> wrong If you compare my expected output with the below output-
>>
>>  *1345653 110909316904    0*
>> *1345653 221065796761    1*
>> *1345653 330760137950    2*
>>
>>
>> On Thu, Jul 19, 2012 at 3:43 PM, Igor Tatarinov <i...@decide.com> wrote:
>>
>>> Remove pid,time from DISTRIBUTE BY.
>>>
>>> On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky <comptechge...@gmail.com
>>> > wrote:
>>>
>>>> Modified Query that I wrote and its not working as expected output is.
>>>>
>>>> *
>>>> *
>>>> *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)*
>>>> *FROM (*
>>>> *    SELECT bid, pid, time*
>>>> *    FROM table1*
>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) =
>>>> '2012-07-09'*
>>>> *    DISTRIBUTE BY bid,pid,time*
>>>> *    SORT BY bid, time desc*
>>>> *) a*
>>>> *WHERE rank(bid) < 3;*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky <
>>>> comptechge...@gmail.com> wrote:
>>>>
>>>>> I wrote this query after modifying it-
>>>>>
>>>>> *SELECT buyer_id, item_id, rank(buyer_id), created_time,
>>>>> UNIX_TIMESTAMP(created_time)*
>>>>> *FROM (*
>>>>> *    SELECT buyer_id, item_id, created_time*
>>>>> *    FROM testingtable1*
>>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as
>>>>> int))) = '2012-07-09'*
>>>>> *    DISTRIBUTE BY buyer_id,item_id*
>>>>> *    SORT BY buyer_id, created_time desc*
>>>>> *) a*
>>>>> *WHERE rank(buyer_id) < 3;*
>>>>>
>>>>> And the output I got is which is sligtly wrong as it is missing two
>>>>> rows-
>>>>>
>>>>> *1345653    330760137950       2012-07-09 21:43:29*
>>>>> *1345653    330760137950       2012-07-09 21:42:29*
>>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>>
>>>>> These two rows are missing-
>>>>>
>>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>>
>>>>>  So full output should be like this-
>>>>>
>>>>> *1345653    330760137950       2012-07-09 21:43:29*
>>>>> *1345653    330760137950       2012-07-09 21:42:29*
>>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>>
>>>>>
>>>>> On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky <
>>>>> comptechge...@gmail.com> wrote:
>>>>>
>>>>>> Can you show me the exact query that I need to do for this particular
>>>>>> problem consideing my scenario? It will be of great help to me. As I am 
>>>>>> new
>>>>>> to HiveQL.
>>>>>>
>>>>>> I need TOP 3 for those if BID and PID gets matched but with different
>>>>>> timestamp.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans <
>>>>>> philip.j.trom...@gmail.com> wrote:
>>>>>>
>>>>>>> Your rank() is being evaluated map side. Put your distribute by and
>>>>>>> sort by in an inner query, and then evaluate your rank() in an outer 
>>>>>>> query.
>>>>>>>
>>>>>>> Phil.
>>>>>>> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechge...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> This is the below data in my Table1
>>>>>>>>
>>>>>>>>
>>>>>>>> BID       PID                       TIME
>>>>>>>>
>>>>>>>> --------------+-------------------------+--------------------------------
>>>>>>>> 1345653   330760137950       2012-07-09 21:42:29
>>>>>>>> 1345653   330760137950       2012-07-09 21:43:29
>>>>>>>> 1345653   330760137950       2012-07-09 21:40:29
>>>>>>>> 1345653   330760137950       2012-07-09 21:41:29
>>>>>>>> 1345653   110909316904       2012-07-09 21:29:06
>>>>>>>> 1345653   221065796761       2012-07-09 19:31:48
>>>>>>>>
>>>>>>>> So If I need to clarify the above scenario- I have data in above
>>>>>>>> table like this-
>>>>>>>> For USER *`1345653` *I have this PID `*330760137950` *four times
>>>>>>>> but with different timestamps in red color. So I need the output 
>>>>>>>> something
>>>>>>>> like this-
>>>>>>>>
>>>>>>>> Output that I need:-
>>>>>>>>
>>>>>>>> *1345653    330760137950       2012-07-09 21:43:29 *
>>>>>>>> *1345653    330760137950       2012-07-09 21:42:29 *
>>>>>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>>>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>>>>>
>>>>>>>> So Basically If BID and PID are same but with different timestamps,
>>>>>>>> then I need TOP 3 sorted with TIME in descending order
>>>>>>>>
>>>>>>>> And for this I created rank UDF (User Defined Function). And I
>>>>>>>> wrote the below query but its not working for me. Can anyone help me on
>>>>>>>> this?
>>>>>>>> *
>>>>>>>> *
>>>>>>>> *
>>>>>>>> *
>>>>>>>> *SELECT buyer_id, item_id, created_time*
>>>>>>>> *    FROM table1*
>>>>>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as
>>>>>>>> int))) = '2012-07-09'*
>>>>>>>> *    DISTRIBUTE BY buyer_id*
>>>>>>>> *    SORT BY buyer_id, created_time desc*
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to