[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769049#comment-13769049 ]
Mohammad Kamrul Islam commented on HIVE-4732: --------------------------------------------- [~appodictic]: I can see your point. Indeed a very informative link. As the link mentioned, the probability of ID collisions are very very rare. Pasted from wikipedia: "To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion,[38] which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs." With these probability, will it be necessary to make thing complex. Moreover, these IDs are often few in one hive session. > Reduce or eliminate the expensive Schema equals() check for AvroSerde > --------------------------------------------------------------------- > > Key: HIVE-4732 > URL: https://issues.apache.org/jira/browse/HIVE-4732 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Reporter: Mark Wagner > Assignee: Mohammad Kamrul Islam > Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, > HIVE-4732.v1.patch, HIVE-4732.v4.patch > > > The AvroSerde spends a significant amount of time checking schema equality. > Changing to compare hashcodes (which can be computed once then reused) will > improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira