Hi Raj,

You'll have to change the format of your date to something like YYYY-MM-DD.
For example, for "2-oct-2013" it will be 2013-10-02.

Best Regards,
Nishant Kelkar



On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop <hadoop...@yahoo.com> wrote:

> The
>
> SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
>
> is returning the lowest date. I need the largest date.
>
>
>
> --------------------------------------------
> On Wed, 9/10/14, Raj Hadoop <hadoop...@yahoo.com> wrote:
>
>  Subject: Re: Remove duplicate records in Hive
>  To: user@hive.apache.org
>  Date: Wednesday, September 10, 2014, 2:41 PM
>
>  Thanks. I will try it.
>  --------------------------------------------
>  On Wed, 9/10/14, Nishant Kelkar <nishant....@gmail.com>
>  wrote:
>
>   Subject: Re: Remove
>  duplicate records in Hive
>   To: user@hive.apache.org,
>  hadoop...@yahoo.com
>   Date: Wednesday, September 10, 2014, 1:59
>  PM
>
>   Hi
>
>  Raj,
>   You can do something
>   along these lines:
>
>   SELECT
>   cno, sqno,
>  SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
>   FROM table GROUP BY cno, sqno;
>   However, you have to make sure your
>   date format is such that sorting it gives you
>  the most
>   recent date. The best way to do
>  that is to have it in
>   format:
>  YYYY-MM-DD.
>   Hope this helps.
>   Best Regards,Nishant
>
>  Kelkar
>   On Wed, Sep 10, 2014 at
>   10:04 AM, Raj Hadoop <hadoop...@yahoo.com>
>   wrote:
>
>
>   Hi,
>
>
>
>   I have a requirement in Hive
>  to remove duplicate records (
>   they differ
>  only by one column i.e a date column) and keep
>   the latest date record.
>
>
>
>   Sample
>  :
>
>   Hive Table :
>
>    d2 is a higher
>
>   cno,sqno,date
>
>
>
>   100 1 1-oct-2013
>
>   101 2 1-oct-2013
>
>   100 1 2-oct-2013
>
>   102 2 2-oct-2013
>
>
>
>
>
>   Output needed:
>
>
>
>   100 1 2-oct-2013
>
>   101 2 1-oct-2013
>
>   102 2 2-oct-2013
>
>
>
>   I am using
>  Hive 0.11
>
>
>
>   Any suggestions please ?
>
>
>
>   Regards,
>
>
>  Raj
>
>
>
>

Reply via email to