Hi Raj,

You can do something along these lines:

SELECT cno, sqno, SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date FROM
table GROUP BY cno, sqno;

However, you have to make sure your date format is such that sorting it
gives you the most recent date. The best way to do that is to have it in
format: YYYY-MM-DD.

Hope this helps.

Best Regards,
Nishant Kelkar

On Wed, Sep 10, 2014 at 10:04 AM, Raj Hadoop <hadoop...@yahoo.com> wrote:

>
> Hi,
>
> I have a requirement in Hive to remove duplicate records ( they differ
> only by one column i.e a date column) and keep the latest date record.
>
> Sample :
> Hive Table :
>  d2 is a higher
> cno,sqno,date
>
> 100 1 1-oct-2013
> 101 2 1-oct-2013
> 100 1 2-oct-2013
> 102 2 2-oct-2013
>
>
> Output needed:
>
> 100 1 2-oct-2013
> 101 2 1-oct-2013
> 102 2 2-oct-2013
>
> I am using Hive 0.11
>
> Any suggestions please ?
>
> Regards,
> Raj
>

Reply via email to