I wrote this query after modifying it- *SELECT buyer_id, item_id, rank(buyer_id), created_time, UNIX_TIMESTAMP(created_time)* *FROM (* * SELECT buyer_id, item_id, created_time* * FROM testingtable1* * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as int))) = '2012-07-09'* * DISTRIBUTE BY buyer_id,item_id* * SORT BY buyer_id, created_time desc* *) a* *WHERE rank(buyer_id) < 3;*
And the output I got is which is sligtly wrong as it is missing two rows- *1345653 330760137950 2012-07-09 21:43:29* *1345653 330760137950 2012-07-09 21:42:29* *1345653 330760137950 2012-07-09 21:41:29* These two rows are missing- 1345653 110909316904 2012-07-09 21:29:06 1345653 221065796761 2012-07-09 19:31:48 So full output should be like this- *1345653 330760137950 2012-07-09 21:43:29* *1345653 330760137950 2012-07-09 21:42:29* *1345653 330760137950 2012-07-09 21:41:29* 1345653 110909316904 2012-07-09 21:29:06 1345653 221065796761 2012-07-09 19:31:48 On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky <comptechge...@gmail.com>wrote: > Can you show me the exact query that I need to do for this particular > problem consideing my scenario? It will be of great help to me. As I am new > to HiveQL. > > I need TOP 3 for those if BID and PID gets matched but with different > timestamp. > > > > On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans < > philip.j.trom...@gmail.com> wrote: > >> Your rank() is being evaluated map side. Put your distribute by and sort >> by in an inner query, and then evaluate your rank() in an outer query. >> >> Phil. >> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechge...@gmail.com> >> wrote: >> >>> This is the below data in my Table1 >>> >>> >>> BID PID TIME >>> --------------+-------------------------+-------------------------------- >>> 1345653 330760137950 2012-07-09 21:42:29 >>> 1345653 330760137950 2012-07-09 21:43:29 >>> 1345653 330760137950 2012-07-09 21:40:29 >>> 1345653 330760137950 2012-07-09 21:41:29 >>> 1345653 110909316904 2012-07-09 21:29:06 >>> 1345653 221065796761 2012-07-09 19:31:48 >>> >>> So If I need to clarify the above scenario- I have data in above table >>> like this- >>> For USER *`1345653` *I have this PID `*330760137950` *four times but >>> with different timestamps in red color. So I need the output something like >>> this- >>> >>> Output that I need:- >>> >>> *1345653 330760137950 2012-07-09 21:43:29 * >>> *1345653 330760137950 2012-07-09 21:42:29 * >>> *1345653 330760137950 2012-07-09 21:41:29* >>> 1345653 110909316904 2012-07-09 21:29:06 >>> 1345653 221065796761 2012-07-09 19:31:48 >>> >>> So Basically If BID and PID are same but with different timestamps, then >>> I need TOP 3 sorted with TIME in descending order >>> >>> And for this I created rank UDF (User Defined Function). And I wrote the >>> below query but its not working for me. Can anyone help me on this? >>> * >>> * >>> * >>> * >>> *SELECT buyer_id, item_id, created_time* >>> * FROM table1* >>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as >>> int))) = '2012-07-09'* >>> * DISTRIBUTE BY buyer_id* >>> * SORT BY buyer_id, created_time desc* >>> >>> >