[ https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
BELUGA BEHR updated HIVE-21240: ------------------------------- Status: Patch Available (was: Open) Added patch to fix JSON writer when using derived column names (_c0, _c1, etc.) OK. So, the Kafka_Handler Q-Test fails locally on trunk as well, so please ignore that UT failure. If Jenkins comes back clean, please consider accepting [^HIVE-21240.9.patch] for inclusion into the project. Reads with this SerDe are a bit quicker, writes, a bit slower. I'm not exactly sure what makes the reads faster, but the slower writes are expected as the writer more fully utilizes the Jackson library whereas the current implementation uses its own writing mechanisms that is very lightweight. > JSON SerDe Re-Write > ------------------- > > Key: HIVE-21240 > URL: https://issues.apache.org/jira/browse/HIVE-21240 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 3.1.1, 4.0.0 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, > HIVE-21240.2.patch, HIVE-21240.3.patch, HIVE-21240.4.patch, > HIVE-21240.5.patch, HIVE-21240.6.patch, HIVE-21240.7.patch, > HIVE-21240.9.patch, HIVE-24240.8.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The JSON SerDe has a few issues, I will link them to this JIRA. > * Use Jackson Tree parser instead of manually parsing > * Added support for base-64 encoded data (the expected format when using JSON) > * Added support to skip blank lines (returns all columns as null values) > * Current JSON parser accepts, but does not apply, custom timestamp formats > in most cases > * Added some unit tests > * Added cache for column-name to column-index searches, currently O\(n\) for > each row processed, for each column in the row -- This message was sent by Atlassian JIRA (v7.6.3#76005)