Hi Igor, I am not sure what I have to remove from Distribute By as in distribute by we have bid, pid and you said remove bid and time from distribute by and it doesn't have time
*SELECT bid, pid, rank FROM * *(SELECT bid, pid, rank(bid) rank, time, UNIX_TIMESTAMP(time) FROM * * ( SELECT bid, pid, time FROM table1* * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) = '2012-07-09' * * DISTRIBUTE BY bid,pid* * SORT BY bid,pid, time desc) A * * ) B * *WHERE rank < 3;* And also I tried running the above query as it is. I am not getting expected output instead of that I am getting output like this which is wrong If you compare my expected output with the below output- *1345653 110909316904 0* *1345653 221065796761 1* *1345653 330760137950 2* On Thu, Jul 19, 2012 at 3:43 PM, Igor Tatarinov <i...@decide.com> wrote: > Remove pid,time from DISTRIBUTE BY. > > On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky > <comptechge...@gmail.com>wrote: > >> Modified Query that I wrote and its not working as expected output is. >> >> * >> * >> *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)* >> *FROM (* >> * SELECT bid, pid, time* >> * FROM table1* >> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) = >> '2012-07-09'* >> * DISTRIBUTE BY bid,pid,time* >> * SORT BY bid, time desc* >> *) a* >> *WHERE rank(bid) < 3;* >> >> >> >> >> >> On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky >> <comptechge...@gmail.com>wrote: >> >>> I wrote this query after modifying it- >>> >>> *SELECT buyer_id, item_id, rank(buyer_id), created_time, >>> UNIX_TIMESTAMP(created_time)* >>> *FROM (* >>> * SELECT buyer_id, item_id, created_time* >>> * FROM testingtable1* >>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as >>> int))) = '2012-07-09'* >>> * DISTRIBUTE BY buyer_id,item_id* >>> * SORT BY buyer_id, created_time desc* >>> *) a* >>> *WHERE rank(buyer_id) < 3;* >>> >>> And the output I got is which is sligtly wrong as it is missing two rows- >>> >>> *1345653 330760137950 2012-07-09 21:43:29* >>> *1345653 330760137950 2012-07-09 21:42:29* >>> *1345653 330760137950 2012-07-09 21:41:29* >>> >>> These two rows are missing- >>> >>> 1345653 110909316904 2012-07-09 21:29:06 >>> 1345653 221065796761 2012-07-09 19:31:48 >>> >>> So full output should be like this- >>> >>> *1345653 330760137950 2012-07-09 21:43:29* >>> *1345653 330760137950 2012-07-09 21:42:29* >>> *1345653 330760137950 2012-07-09 21:41:29* >>> 1345653 110909316904 2012-07-09 21:29:06 >>> 1345653 221065796761 2012-07-09 19:31:48 >>> >>> >>> On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky <comptechge...@gmail.com >>> > wrote: >>> >>>> Can you show me the exact query that I need to do for this particular >>>> problem consideing my scenario? It will be of great help to me. As I am new >>>> to HiveQL. >>>> >>>> I need TOP 3 for those if BID and PID gets matched but with different >>>> timestamp. >>>> >>>> >>>> >>>> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans < >>>> philip.j.trom...@gmail.com> wrote: >>>> >>>>> Your rank() is being evaluated map side. Put your distribute by and >>>>> sort by in an inner query, and then evaluate your rank() in an outer >>>>> query. >>>>> >>>>> Phil. >>>>> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechge...@gmail.com> >>>>> wrote: >>>>> >>>>>> This is the below data in my Table1 >>>>>> >>>>>> >>>>>> BID PID TIME >>>>>> >>>>>> --------------+-------------------------+-------------------------------- >>>>>> 1345653 330760137950 2012-07-09 21:42:29 >>>>>> 1345653 330760137950 2012-07-09 21:43:29 >>>>>> 1345653 330760137950 2012-07-09 21:40:29 >>>>>> 1345653 330760137950 2012-07-09 21:41:29 >>>>>> 1345653 110909316904 2012-07-09 21:29:06 >>>>>> 1345653 221065796761 2012-07-09 19:31:48 >>>>>> >>>>>> So If I need to clarify the above scenario- I have data in above >>>>>> table like this- >>>>>> For USER *`1345653` *I have this PID `*330760137950` *four times but >>>>>> with different timestamps in red color. So I need the output something >>>>>> like >>>>>> this- >>>>>> >>>>>> Output that I need:- >>>>>> >>>>>> *1345653 330760137950 2012-07-09 21:43:29 * >>>>>> *1345653 330760137950 2012-07-09 21:42:29 * >>>>>> *1345653 330760137950 2012-07-09 21:41:29* >>>>>> 1345653 110909316904 2012-07-09 21:29:06 >>>>>> 1345653 221065796761 2012-07-09 19:31:48 >>>>>> >>>>>> So Basically If BID and PID are same but with different timestamps, >>>>>> then I need TOP 3 sorted with TIME in descending order >>>>>> >>>>>> And for this I created rank UDF (User Defined Function). And I wrote >>>>>> the below query but its not working for me. Can anyone help me on this? >>>>>> * >>>>>> * >>>>>> * >>>>>> * >>>>>> *SELECT buyer_id, item_id, created_time* >>>>>> * FROM table1* >>>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as >>>>>> int))) = '2012-07-09'* >>>>>> * DISTRIBUTE BY buyer_id* >>>>>> * SORT BY buyer_id, created_time desc* >>>>>> >>>>>> >>>> >>> >> >