Hi Igor,

I am not sure what I have to remove from Distribute By as in distribute by
we have bid, pid and you said remove bid and time from distribute by and it
doesn't have time

*SELECT bid, pid, rank FROM *
      *(SELECT bid, pid, rank(bid) rank, time, UNIX_TIMESTAMP(time)  FROM
*
  *
           ( SELECT bid, pid, time FROM table1*
* where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) =
'2012-07-09' *
*            DISTRIBUTE BY bid,pid*
*            SORT BY bid,pid, time desc) A
*
*
      ) B
*
*WHERE rank < 3;*


And also I tried running the above query as it is. I am not getting
expected output instead of that I am getting output like this which is
wrong If you compare my expected output with the below output-

*1345653 110909316904    0*
*1345653 221065796761    1*
*1345653 330760137950    2*


On Thu, Jul 19, 2012 at 3:43 PM, Igor Tatarinov <i...@decide.com> wrote:

> Remove pid,time from DISTRIBUTE BY.
>
> On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky 
> <comptechge...@gmail.com>wrote:
>
>> Modified Query that I wrote and its not working as expected output is.
>>
>> *
>> *
>> *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)*
>> *FROM (*
>> *    SELECT bid, pid, time*
>> *    FROM table1*
>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) =
>> '2012-07-09'*
>> *    DISTRIBUTE BY bid,pid,time*
>> *    SORT BY bid, time desc*
>> *) a*
>> *WHERE rank(bid) < 3;*
>>
>>
>>
>>
>>
>> On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky 
>> <comptechge...@gmail.com>wrote:
>>
>>> I wrote this query after modifying it-
>>>
>>> *SELECT buyer_id, item_id, rank(buyer_id), created_time,
>>> UNIX_TIMESTAMP(created_time)*
>>> *FROM (*
>>> *    SELECT buyer_id, item_id, created_time*
>>> *    FROM testingtable1*
>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as
>>> int))) = '2012-07-09'*
>>> *    DISTRIBUTE BY buyer_id,item_id*
>>> *    SORT BY buyer_id, created_time desc*
>>> *) a*
>>> *WHERE rank(buyer_id) < 3;*
>>>
>>> And the output I got is which is sligtly wrong as it is missing two rows-
>>>
>>> *1345653    330760137950       2012-07-09 21:43:29*
>>> *1345653    330760137950       2012-07-09 21:42:29*
>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>
>>> These two rows are missing-
>>>
>>> 1345653    110909316904       2012-07-09 21:29:06
>>> 1345653    221065796761       2012-07-09 19:31:48
>>>
>>>  So full output should be like this-
>>>
>>> *1345653    330760137950       2012-07-09 21:43:29*
>>> *1345653    330760137950       2012-07-09 21:42:29*
>>> *1345653    330760137950       2012-07-09 21:41:29*
>>> 1345653    110909316904       2012-07-09 21:29:06
>>> 1345653    221065796761       2012-07-09 19:31:48
>>>
>>>
>>> On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky <comptechge...@gmail.com
>>> > wrote:
>>>
>>>> Can you show me the exact query that I need to do for this particular
>>>> problem consideing my scenario? It will be of great help to me. As I am new
>>>> to HiveQL.
>>>>
>>>> I need TOP 3 for those if BID and PID gets matched but with different
>>>> timestamp.
>>>>
>>>>
>>>>
>>>> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans <
>>>> philip.j.trom...@gmail.com> wrote:
>>>>
>>>>> Your rank() is being evaluated map side. Put your distribute by and
>>>>> sort by in an inner query, and then evaluate your rank() in an outer 
>>>>> query.
>>>>>
>>>>> Phil.
>>>>> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechge...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> This is the below data in my Table1
>>>>>>
>>>>>>
>>>>>> BID       PID                       TIME
>>>>>>
>>>>>> --------------+-------------------------+--------------------------------
>>>>>> 1345653   330760137950       2012-07-09 21:42:29
>>>>>> 1345653   330760137950       2012-07-09 21:43:29
>>>>>> 1345653   330760137950       2012-07-09 21:40:29
>>>>>> 1345653   330760137950       2012-07-09 21:41:29
>>>>>> 1345653   110909316904       2012-07-09 21:29:06
>>>>>> 1345653   221065796761       2012-07-09 19:31:48
>>>>>>
>>>>>> So If I need to clarify the above scenario- I have data in above
>>>>>> table like this-
>>>>>> For USER *`1345653` *I have this PID `*330760137950` *four times but
>>>>>> with different timestamps in red color. So I need the output something 
>>>>>> like
>>>>>> this-
>>>>>>
>>>>>> Output that I need:-
>>>>>>
>>>>>> *1345653    330760137950       2012-07-09 21:43:29 *
>>>>>> *1345653    330760137950       2012-07-09 21:42:29 *
>>>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>>>
>>>>>> So Basically If BID and PID are same but with different timestamps,
>>>>>> then I need TOP 3 sorted with TIME in descending order
>>>>>>
>>>>>> And for this I created rank UDF (User Defined Function). And I wrote
>>>>>> the below query but its not working for me. Can anyone help me on this?
>>>>>> *
>>>>>> *
>>>>>> *
>>>>>> *
>>>>>> *SELECT buyer_id, item_id, created_time*
>>>>>> *    FROM table1*
>>>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as
>>>>>> int))) = '2012-07-09'*
>>>>>> *    DISTRIBUTE BY buyer_id*
>>>>>> *    SORT BY buyer_id, created_time desc*
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Reply via email to