[ https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779499#comment-16779499 ]
BELUGA BEHR commented on HIVE-21240: ------------------------------------ [~bslim] With a large project like Hive, maintained by many different supporters and countless number of additional troubleshooters that dig through the code to resolve issues, it is all the more important to adhere to best practices. With few exceptions, everything should be a Java Collection. Making smart choices about the actual data structures used (Set, Map, List, etc.) is going to yield much more benefit than trying to manipulate primitive arrays. I've never had a Hive user complain that they wished it was 2% faster, but I hear all the time about how complicated the product is and how difficult it is to troubleshoot. There are a few books written on the topic which I won't regurgitate here, but I think this sums it up well: https://stackoverflow.com/questions/6100148/collection-interface-vs-arrays > JSON SerDe Re-Write > ------------------- > > Key: HIVE-21240 > URL: https://issues.apache.org/jira/browse/HIVE-21240 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 4.0.0, 3.1.1 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, > HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, > HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, > HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, > HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, > HIVE-24240.8.patch, kafka_storage_handler.diff > > Time Spent: 10m > Remaining Estimate: 0h > > The JSON SerDe has a few issues, I will link them to this JIRA. > * Use Jackson Tree parser instead of manually parsing > * Added support for base-64 encoded data (the expected format when using JSON) > * Added support to skip blank lines (returns all columns as null values) > * Current JSON parser accepts, but does not apply, custom timestamp formats > in most cases > * Added some unit tests > * Added cache for column-name to column-index searches, currently O\(n\) for > each row processed, for each column in the row -- This message was sent by Atlassian JIRA (v7.6.3#76005)