Sorry, just pid needs to be dropped from both DISTRIBUTE and SORT clauses. Your very first query was correct except for the nested subquery part. (You don't need a double-nested subquery.)
On Thu, Jul 19, 2012 at 3:48 PM, comptech geeky <comptechge...@gmail.com>wrote: > Hi Igor, > > I am not sure what I have to remove from Distribute By as in distribute by > we have bid, pid and you said remove bid and time from distribute by and it > doesn't have time > > *SELECT bid, pid, rank FROM * > *(SELECT bid, pid, rank(bid) rank, time, UNIX_TIMESTAMP(time) FROM > * > * > ( SELECT bid, pid, time FROM table1* > * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) = > '2012-07-09' * > * DISTRIBUTE BY bid,pid* > * SORT BY bid,pid, time desc) A > * > * > ) B > * > *WHERE rank < 3;* > > > And also I tried running the above query as it is. I am not getting > expected output instead of that I am getting output like this which is > wrong If you compare my expected output with the below output- > > *1345653 110909316904 0* > *1345653 221065796761 1* > *1345653 330760137950 2* > > > On Thu, Jul 19, 2012 at 3:43 PM, Igor Tatarinov <i...@decide.com> wrote: > >> Remove pid,time from DISTRIBUTE BY. >> >> On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky >> <comptechge...@gmail.com>wrote: >> >>> Modified Query that I wrote and its not working as expected output is. >>> >>> * >>> * >>> *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)* >>> *FROM (* >>> * SELECT bid, pid, time* >>> * FROM table1* >>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) = >>> '2012-07-09'* >>> * DISTRIBUTE BY bid,pid,time* >>> * SORT BY bid, time desc* >>> *) a* >>> *WHERE rank(bid) < 3;* >>> >>> >>> >>> >>> >>> On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky <comptechge...@gmail.com >>> > wrote: >>> >>>> I wrote this query after modifying it- >>>> >>>> *SELECT buyer_id, item_id, rank(buyer_id), created_time, >>>> UNIX_TIMESTAMP(created_time)* >>>> *FROM (* >>>> * SELECT buyer_id, item_id, created_time* >>>> * FROM testingtable1* >>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as >>>> int))) = '2012-07-09'* >>>> * DISTRIBUTE BY buyer_id,item_id* >>>> * SORT BY buyer_id, created_time desc* >>>> *) a* >>>> *WHERE rank(buyer_id) < 3;* >>>> >>>> And the output I got is which is sligtly wrong as it is missing two >>>> rows- >>>> >>>> *1345653 330760137950 2012-07-09 21:43:29* >>>> *1345653 330760137950 2012-07-09 21:42:29* >>>> *1345653 330760137950 2012-07-09 21:41:29* >>>> >>>> These two rows are missing- >>>> >>>> 1345653 110909316904 2012-07-09 21:29:06 >>>> 1345653 221065796761 2012-07-09 19:31:48 >>>> >>>> So full output should be like this- >>>> >>>> *1345653 330760137950 2012-07-09 21:43:29* >>>> *1345653 330760137950 2012-07-09 21:42:29* >>>> *1345653 330760137950 2012-07-09 21:41:29* >>>> 1345653 110909316904 2012-07-09 21:29:06 >>>> 1345653 221065796761 2012-07-09 19:31:48 >>>> >>>> >>>> On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky < >>>> comptechge...@gmail.com> wrote: >>>> >>>>> Can you show me the exact query that I need to do for this particular >>>>> problem consideing my scenario? It will be of great help to me. As I am >>>>> new >>>>> to HiveQL. >>>>> >>>>> I need TOP 3 for those if BID and PID gets matched but with different >>>>> timestamp. >>>>> >>>>> >>>>> >>>>> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans < >>>>> philip.j.trom...@gmail.com> wrote: >>>>> >>>>>> Your rank() is being evaluated map side. Put your distribute by and >>>>>> sort by in an inner query, and then evaluate your rank() in an outer >>>>>> query. >>>>>> >>>>>> Phil. >>>>>> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechge...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> This is the below data in my Table1 >>>>>>> >>>>>>> >>>>>>> BID PID TIME >>>>>>> >>>>>>> --------------+-------------------------+-------------------------------- >>>>>>> 1345653 330760137950 2012-07-09 21:42:29 >>>>>>> 1345653 330760137950 2012-07-09 21:43:29 >>>>>>> 1345653 330760137950 2012-07-09 21:40:29 >>>>>>> 1345653 330760137950 2012-07-09 21:41:29 >>>>>>> 1345653 110909316904 2012-07-09 21:29:06 >>>>>>> 1345653 221065796761 2012-07-09 19:31:48 >>>>>>> >>>>>>> So If I need to clarify the above scenario- I have data in above >>>>>>> table like this- >>>>>>> For USER *`1345653` *I have this PID `*330760137950` *four times >>>>>>> but with different timestamps in red color. So I need the output >>>>>>> something >>>>>>> like this- >>>>>>> >>>>>>> Output that I need:- >>>>>>> >>>>>>> *1345653 330760137950 2012-07-09 21:43:29 * >>>>>>> *1345653 330760137950 2012-07-09 21:42:29 * >>>>>>> *1345653 330760137950 2012-07-09 21:41:29* >>>>>>> 1345653 110909316904 2012-07-09 21:29:06 >>>>>>> 1345653 221065796761 2012-07-09 19:31:48 >>>>>>> >>>>>>> So Basically If BID and PID are same but with different timestamps, >>>>>>> then I need TOP 3 sorted with TIME in descending order >>>>>>> >>>>>>> And for this I created rank UDF (User Defined Function). And I wrote >>>>>>> the below query but its not working for me. Can anyone help me on this? >>>>>>> * >>>>>>> * >>>>>>> * >>>>>>> * >>>>>>> *SELECT buyer_id, item_id, created_time* >>>>>>> * FROM table1* >>>>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as >>>>>>> int))) = '2012-07-09'* >>>>>>> * DISTRIBUTE BY buyer_id* >>>>>>> * SORT BY buyer_id, created_time desc* >>>>>>> >>>>>>> >>>>> >>>> >>> >> >