[ 
https://issues.apache.org/jira/browse/HIVE-26809?focusedWorklogId=841158&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-841158
 ]

ASF GitHub Bot logged work on HIVE-26809:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Jan/23 15:34
            Start Date: 23/Jan/23 15:34
    Worklog Time Spent: 10m 
      Work Description: abstractdog commented on code in PR #3833:
URL: https://github.com/apache/hive/pull/3833#discussion_r1084204246


##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java:
##########
@@ -224,7 +237,252 @@ private static void skipCompressedIndex(boolean 
isCompressed, PositionProvider i
     index.getNext();
   }
 
-  protected static class StringStreamReader extends StringTreeReader
+  public static class StringDictionaryTreeReaderHive extends TreeReader {

Review Comment:
   I was trying to understand the scenario here and the way I see this: the 
current PR code is not the proper one as we end up Hive on ORC 1.8.x but 
without an important optimization introduced in ORC-1060, so if we have to copy 
some ORC code anyway, let's have ORC-1060 at least here (sometimes I feel we 
need to port changes on separate jiras, but here we can merge them together)
   I see that the basic confusion comes from the fact that in ORC we have a 
common StringTreeReader which encapsulates different kinds of string readers 
like StringDirectTreeReader, StringDictionaryTreeReader, but in hive's 
StringStreamReader we have dictionary-related properties like 
_dictionaryStream, _lengthStream, which is confusing...if we're already 
subclassing ORC tree readers, we should follow it like:
   ```
   HIVE -> ORC
   StringStreamReader -> StringTreeReader (as it is now)
   StringDictionaryStreamReader -> StringDictionaryTreeReader
   StringDirectStreamReader -> StringDirectTreeReader
   ```
   this is a change that should be done regardless of ORC 1.8 upgrade in my 
opinion, and prior to ORC 1.8 upgrade
   once we follow ORC tree class hierarchy, we have a better chance to adapt 
changes like ORC-1060, where e.g. only the dictionary reader has been changed
   
   guys, if you agree with this, let's address the above problem in a separate 
hive ticket first, it's worth spending the time on it, especially if turns out 
that the ORC 1.8 upgrade becomes a clearer thing
   





Issue Time Tracking
-------------------

    Worklog Id:     (was: 841158)
    Time Spent: 6h  (was: 5h 50m)

> Upgrade ORC to 1.8.1
> --------------------
>
>                 Key: HIVE-26809
>                 URL: https://issues.apache.org/jira/browse/HIVE-26809
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: Dmitriy Fingerman
>            Assignee: Dmitriy Fingerman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to