[ https://issues.apache.org/jira/browse/HIVE-21328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Mollitor updated HIVE-21328: ---------------------------------- Attachment: HIVE-21328.1.patch > Call To Hadoop Text getBytes() Without Call to getLength() > ---------------------------------------------------------- > > Key: HIVE-21328 > URL: https://issues.apache.org/jira/browse/HIVE-21328 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 4.0.0, 3.2.0 > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Major > Attachments: HIVE-21328.1.patch > > > I'm not sure if there is actually a bug, but this looks highly suspect: > {code:java} > public Object set(final Object o, final Text text) { > return new BytesWritable(text == null ? null : text.getBytes()); > } > {code} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java#L104-L106 > There are two components to a Text object. There are the internal bytes and > the length of the bytes. The two are independent. I.e., a quick "reset" on > the Text object simply sets the internal length counter to zero. This code > is potentially looking at obsolete data that it shouldn't be seeing because > it is not considering the length of the Text. -- This message was sent by Atlassian Jira (v8.3.4#803005)