[ 
https://issues.apache.org/jira/browse/HIVE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401067#comment-13401067
 ] 

Ashutosh Chauhan commented on HIVE-3048:
----------------------------------------

+1
Existing implementation actually looks buggy to me. It checks for existence of 
one object and then adds another object. In general case, two object's may have 
different hashcodes and then you are screwed. It will work however as long as 
underlying object's hashcode is based on value which will be the case for 
primitive types and containers containing primitive types which is the case for 
Hive datatypes. It's always a good practice to just add your objects in set and 
let set take care of duplicate elimination. 
                
> Collect_set Aggregate does uneccesary check for value.
> ------------------------------------------------------
>
>                 Key: HIVE-3048
>                 URL: https://issues.apache.org/jira/browse/HIVE-3048
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.8.1
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>         Attachments: HIVE-3048.patch.1.txt
>
>
> Sets already de-duplicate for free no need for existence check.
> {noformat}
>      private void putIntoSet(Object p, MkArrayAggregationBuffer myagg) {
>       if (myagg.container.contains(p))
>         return;
>        Object pCopy = ObjectInspectorUtils.copyToStandardObject(p,
>            this.inputOI);
>        myagg.container.add(pCopy);
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to