Re: Remove duplicate records in Hive

2014-09-11 Thread Raj Hadoop
like -MM-DD. >>For example, for "2-oct-2013" it will be 2013-10-02. >> >> >>Best Regards, >>Nishant Kelkar >> >> >> >> >> >>On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop wrote: >> >>The >>> >>

Re: Remove duplicate records in Hive

2014-09-10 Thread vivek thakre
The >> >> SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date >> >> is returning the lowest date. I need the largest date. >> >> >> >> >> On Wed, 9/10/14, Raj Hadoop wrote: >> >> Subjec

Re: Remove duplicate records in Hive

2014-09-10 Thread Nishant Kelkar
ed, Sep 10, 2014 at 11:48 AM, Raj Hadoop wrote: > > The > > SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date > > is returning the lowest date. I need the largest date. > > > > -------- > On Wed, 9/10/14, Raj Hadoop wrote: > &g

Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
--- >On Wed, 9/10/14, Raj Hadoop wrote: > > Subject: Re: Remove duplicate records in Hive > To: user@hive.apache.org > Date: Wednesday, September 10, 2014, 2:41 PM > > > Thanks. I will try it. > -------- &g

Re: Remove duplicate records in Hive

2014-09-10 Thread Nishant Kelkar
0] AS latest_date > > is returning the lowest date. I need the largest date. > > > > > On Wed, 9/10/14, Raj Hadoop wrote: > > Subject: Re: Remove duplicate records in Hive > To: user@hive.apache.org > Date: Wednesda

Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
The SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date is returning the lowest date. I need the largest date. On Wed, 9/10/14, Raj Hadoop wrote: Subject: Re: Remove duplicate records in Hive To: user@hive.apache.org Date: Wednesday, September 10

Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
Thanks. I will try it. On Wed, 9/10/14, Nishant Kelkar wrote: Subject: Re: Remove duplicate records in Hive To: user@hive.apache.org, hadoop...@yahoo.com Date: Wednesday, September 10, 2014, 1:59 PM Hi Raj,  You can do something along these

Re: Remove duplicate records in Hive

2014-09-10 Thread Nishant Kelkar
Hi Raj, You can do something along these lines: SELECT cno, sqno, SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date FROM table GROUP BY cno, sqno; However, you have to make sure your date format is such that sorting it gives you the most recent date. The best way to do that is to have it in format

Re: Remove duplicate records in Hive

2014-09-10 Thread Kevin Weiler
Whoops, thought this was someone in my office, so obviously you can’t come see me :) -- Kevin Weiler IT IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | http://imc-chicago.com/ Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: kevin.wei...@imc-chicago.com

Re: Remove duplicate records in Hive

2014-09-10 Thread Kevin Weiler
If you can just query the table for your results, you can do a SELECT DISTINCT instead of just a SELECT. If you give me a bit more information about where the duplicate data is coming from, I can provide a bit more detail. You can come see me on the end of desk. -- Kevin Weiler IT IMC Financial