[jira] [Commented] (HIVE-21240) JSON SerDe Deserialize Re-Write

BELUGA BEHR (JIRA) Wed, 13 Feb 2019 05:40:47 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767202#comment-16767202
 ]


BELUGA BEHR commented on HIVE-21240:
------------------------------------

[~kgyrtkirk] Thank you for the review!!

# I think that it's best to let the JSON library, which is specialized in 
parsing, do the work.  It certainly cuts down on code that Hive needs to 
maintain.  Yes, the tree has to be built upfront, but as things currently 
stand, there are very few scenarios where the entire tree isn't traversed.  I 
guess that if there is an "unknown" field in the text, the field is ignored and 
the JSON string value is not parsed into its data type as things currently 
stand.  I don't know that this is a scenario needs to be optimized for.  I have 
not seen many situations where a customer actively wants to ignore a field.
# I think it's best to work with Java {{Collections}} over Java native arrays.  
The only thing that happens with the results of the SerDe is that they are 
iterated over.  The Qtest passing bear this out, so I think starting with 4.0 
it's a good time to make that change.
# I am sorry about the reformatted lines.  Since I was touching a lot of code 
in the JsonSerde, I thought it may be helpful to clean up some check-style 
issues while I'm in there.  I will revert.  Thank you for pointing me at the 
HIve formatter, I have been using the Hadoop formatter for a year+.

Thanks again!

> JSON SerDe Deserialize Re-Write
> -------------------------------
>
>                 Key: HIVE-21240
>                 URL: https://issues.apache.org/jira/browse/HIVE-21240
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 4.0.0, 3.1.1
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.2.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21240) JSON SerDe Deserialize Re-Write

Reply via email to