Modified Query that I wrote and its not working as expected output is. * * *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)* *FROM (* * SELECT bid, pid, time* * FROM table1* * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) = '2012-07-09'* * DISTRIBUTE BY bid,pid,time* * SORT BY bid, time desc* *) a* *WHERE rank(bid) < 3;*
On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky <comptechge...@gmail.com>wrote: > I wrote this query after modifying it- > > *SELECT buyer_id, item_id, rank(buyer_id), created_time, > UNIX_TIMESTAMP(created_time)* > *FROM (* > * SELECT buyer_id, item_id, created_time* > * FROM testingtable1* > * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as int))) > = '2012-07-09'* > * DISTRIBUTE BY buyer_id,item_id* > * SORT BY buyer_id, created_time desc* > *) a* > *WHERE rank(buyer_id) < 3;* > > And the output I got is which is sligtly wrong as it is missing two rows- > > *1345653 330760137950 2012-07-09 21:43:29* > *1345653 330760137950 2012-07-09 21:42:29* > *1345653 330760137950 2012-07-09 21:41:29* > > These two rows are missing- > > 1345653 110909316904 2012-07-09 21:29:06 > 1345653 221065796761 2012-07-09 19:31:48 > > So full output should be like this- > > *1345653 330760137950 2012-07-09 21:43:29* > *1345653 330760137950 2012-07-09 21:42:29* > *1345653 330760137950 2012-07-09 21:41:29* > 1345653 110909316904 2012-07-09 21:29:06 > 1345653 221065796761 2012-07-09 19:31:48 > > > On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky > <comptechge...@gmail.com>wrote: > >> Can you show me the exact query that I need to do for this particular >> problem consideing my scenario? It will be of great help to me. As I am new >> to HiveQL. >> >> I need TOP 3 for those if BID and PID gets matched but with different >> timestamp. >> >> >> >> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans < >> philip.j.trom...@gmail.com> wrote: >> >>> Your rank() is being evaluated map side. Put your distribute by and sort >>> by in an inner query, and then evaluate your rank() in an outer query. >>> >>> Phil. >>> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechge...@gmail.com> >>> wrote: >>> >>>> This is the below data in my Table1 >>>> >>>> >>>> BID PID TIME >>>> >>>> --------------+-------------------------+-------------------------------- >>>> 1345653 330760137950 2012-07-09 21:42:29 >>>> 1345653 330760137950 2012-07-09 21:43:29 >>>> 1345653 330760137950 2012-07-09 21:40:29 >>>> 1345653 330760137950 2012-07-09 21:41:29 >>>> 1345653 110909316904 2012-07-09 21:29:06 >>>> 1345653 221065796761 2012-07-09 19:31:48 >>>> >>>> So If I need to clarify the above scenario- I have data in above table >>>> like this- >>>> For USER *`1345653` *I have this PID `*330760137950` *four times but >>>> with different timestamps in red color. So I need the output something like >>>> this- >>>> >>>> Output that I need:- >>>> >>>> *1345653 330760137950 2012-07-09 21:43:29 * >>>> *1345653 330760137950 2012-07-09 21:42:29 * >>>> *1345653 330760137950 2012-07-09 21:41:29* >>>> 1345653 110909316904 2012-07-09 21:29:06 >>>> 1345653 221065796761 2012-07-09 19:31:48 >>>> >>>> So Basically If BID and PID are same but with different timestamps, >>>> then I need TOP 3 sorted with TIME in descending order >>>> >>>> And for this I created rank UDF (User Defined Function). And I wrote >>>> the below query but its not working for me. Can anyone help me on this? >>>> * >>>> * >>>> * >>>> * >>>> *SELECT buyer_id, item_id, created_time* >>>> * FROM table1* >>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as >>>> int))) = '2012-07-09'* >>>> * DISTRIBUTE BY buyer_id* >>>> * SORT BY buyer_id, created_time desc* >>>> >>>> >> >