[ https://issues.apache.org/jira/browse/HIVE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401067#comment-13401067 ]
Ashutosh Chauhan commented on HIVE-3048: ---------------------------------------- +1 Existing implementation actually looks buggy to me. It checks for existence of one object and then adds another object. In general case, two object's may have different hashcodes and then you are screwed. It will work however as long as underlying object's hashcode is based on value which will be the case for primitive types and containers containing primitive types which is the case for Hive datatypes. It's always a good practice to just add your objects in set and let set take care of duplicate elimination. > Collect_set Aggregate does uneccesary check for value. > ------------------------------------------------------ > > Key: HIVE-3048 > URL: https://issues.apache.org/jira/browse/HIVE-3048 > Project: Hive > Issue Type: Improvement > Affects Versions: 0.8.1 > Reporter: Edward Capriolo > Assignee: Edward Capriolo > Attachments: HIVE-3048.patch.1.txt > > > Sets already de-duplicate for free no need for existence check. > {noformat} > private void putIntoSet(Object p, MkArrayAggregationBuffer myagg) { > if (myagg.container.contains(p)) > return; > Object pCopy = ObjectInspectorUtils.copyToStandardObject(p, > this.inputOI); > myagg.container.add(pCopy); > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira