[ https://issues.apache.org/jira/browse/HIVE-26809?focusedWorklogId=841158&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-841158 ]
ASF GitHub Bot logged work on HIVE-26809: ----------------------------------------- Author: ASF GitHub Bot Created on: 23/Jan/23 15:34 Start Date: 23/Jan/23 15:34 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #3833: URL: https://github.com/apache/hive/pull/3833#discussion_r1084204246 ########## ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java: ########## @@ -224,7 +237,252 @@ private static void skipCompressedIndex(boolean isCompressed, PositionProvider i index.getNext(); } - protected static class StringStreamReader extends StringTreeReader + public static class StringDictionaryTreeReaderHive extends TreeReader { Review Comment: I was trying to understand the scenario here and the way I see this: the current PR code is not the proper one as we end up Hive on ORC 1.8.x but without an important optimization introduced in ORC-1060, so if we have to copy some ORC code anyway, let's have ORC-1060 at least here (sometimes I feel we need to port changes on separate jiras, but here we can merge them together) I see that the basic confusion comes from the fact that in ORC we have a common StringTreeReader which encapsulates different kinds of string readers like StringDirectTreeReader, StringDictionaryTreeReader, but in hive's StringStreamReader we have dictionary-related properties like _dictionaryStream, _lengthStream, which is confusing...if we're already subclassing ORC tree readers, we should follow it like: ``` HIVE -> ORC StringStreamReader -> StringTreeReader (as it is now) StringDictionaryStreamReader -> StringDictionaryTreeReader StringDirectStreamReader -> StringDirectTreeReader ``` this is a change that should be done regardless of ORC 1.8 upgrade in my opinion, and prior to ORC 1.8 upgrade once we follow ORC tree class hierarchy, we have a better chance to adapt changes like ORC-1060, where e.g. only the dictionary reader has been changed guys, if you agree with this, let's address the above problem in a separate hive ticket first, it's worth spending the time on it, especially if turns out that the ORC 1.8 upgrade becomes a clearer thing Issue Time Tracking ------------------- Worklog Id: (was: 841158) Time Spent: 6h (was: 5h 50m) > Upgrade ORC to 1.8.1 > -------------------- > > Key: HIVE-26809 > URL: https://issues.apache.org/jira/browse/HIVE-26809 > Project: Hive > Issue Type: Improvement > Affects Versions: 4.0.0 > Reporter: Dmitriy Fingerman > Assignee: Dmitriy Fingerman > Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)