[ https://issues.apache.org/jira/browse/HIVE-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776298#comment-16776298 ]
BELUGA BEHR commented on HIVE-15475: ------------------------------------ Nope. OK. Figured it out. This issue was inadvertently fixed as part of [HIVE-18545] (Jul 10, 2018). Previous to this change, the JSON stuff was handled by {{org.apache.hive.hcatalog.data.JsonSerDe}} The issue was that this class was not handling the provided {{Text}} object correctly. The {{Text}} object has two components to it: an internal array of bytes *and* a size that indicates which bytes are to be processed. Well, {{JsonSerde}} was not taking into account the size, so, when a zero-length {{Text}} object was submitted, it would still look at the entire internal byte array, ignoring the zero size, and produce duplicates where there should be no text. https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java#L168 > JsonSerDe cannot handle json file with empty lines > -------------------------------------------------- > > Key: HIVE-15475 > URL: https://issues.apache.org/jira/browse/HIVE-15475 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 1.2.1 > Reporter: pin_zhang > Priority: Major > > 1. start HiveServer2 in apache-hive-1.2.1 > 2 start a beeline connect to hive server2 > ADD JAR ADD JAR > /home/apache-hive-1.2.1-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar > ; > CREATE external TABLE my_table(a string, b bigint) > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' > STORED AS TEXTFILE > location 'file:///home/hive/json'; > 3 put a file with more than one new lines at the end of the file > {"a":"a_1", "b" : 1} > 4 run sql > select * from my_table ; > +-------------+-------------+--+ > | my_table.a | my_table.b | > +-------------+-------------+--+ > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > +-------------+-------------+--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)