Hi Raj, You'll have to change the format of your date to something like YYYY-MM-DD. For example, for "2-oct-2013" it will be 2013-10-02.
Best Regards, Nishant Kelkar On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop <hadoop...@yahoo.com> wrote: > The > > SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date > > is returning the lowest date. I need the largest date. > > > > -------------------------------------------- > On Wed, 9/10/14, Raj Hadoop <hadoop...@yahoo.com> wrote: > > Subject: Re: Remove duplicate records in Hive > To: user@hive.apache.org > Date: Wednesday, September 10, 2014, 2:41 PM > > Thanks. I will try it. > -------------------------------------------- > On Wed, 9/10/14, Nishant Kelkar <nishant....@gmail.com> > wrote: > > Subject: Re: Remove > duplicate records in Hive > To: user@hive.apache.org, > hadoop...@yahoo.com > Date: Wednesday, September 10, 2014, 1:59 > PM > > Hi > > Raj, > You can do something > along these lines: > > SELECT > cno, sqno, > SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date > FROM table GROUP BY cno, sqno; > However, you have to make sure your > date format is such that sorting it gives you > the most > recent date. The best way to do > that is to have it in > format: > YYYY-MM-DD. > Hope this helps. > Best Regards,Nishant > > Kelkar > On Wed, Sep 10, 2014 at > 10:04 AM, Raj Hadoop <hadoop...@yahoo.com> > wrote: > > > Hi, > > > > I have a requirement in Hive > to remove duplicate records ( > they differ > only by one column i.e a date column) and keep > the latest date record. > > > > Sample > : > > Hive Table : > > d2 is a higher > > cno,sqno,date > > > > 100 1 1-oct-2013 > > 101 2 1-oct-2013 > > 100 1 2-oct-2013 > > 102 2 2-oct-2013 > > > > > > Output needed: > > > > 100 1 2-oct-2013 > > 101 2 1-oct-2013 > > 102 2 2-oct-2013 > > > > I am using > Hive 0.11 > > > > Any suggestions please ? > > > > Regards, > > > Raj > > > >