[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

ASF GitHub Bot (Jira) Mon, 25 Jan 2021 14:51:04 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=541348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-541348
 ]


ASF GitHub Bot logged work on HIVE-23553:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Jan/21 22:50
            Start Date: 25/Jan/21 22:50
    Worklog Time Spent: 10m 
      Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r564077084



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##########
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
             .setColumnEncoding(columnEncoding)
             .setVectors(vectors)
             .setContext(context)
+            .setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
       Isn't this always `false`? Don't we need another case for 
TIMESTAMP_INSTANT?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/LlapDataReader.java
##########
@@ -0,0 +1,93 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.io.orc.encoded;
+
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.OrcIndex;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+/** An abstract data reader that IO formats can use to read bytes from 
underlying storage. */
+public interface LlapDataReader extends AutoCloseable, Cloneable {
+
+  /** Opens the DataReader, making it ready to use. */
+  void open() throws IOException;
+
+  OrcIndex readRowIndex(StripeInformation stripe,
+      TypeDescription fileSchema,
+      OrcProto.StripeFooter footer,
+      boolean ignoreNonUtf8BloomFilter,
+      boolean[] included,
+      OrcProto.RowIndex[] indexes,
+      boolean[] sargColumns,
+      OrcFile.WriterVersion version,
+      OrcProto.Stream.Kind[] bloomFilterKinds,
+      OrcProto.BloomFilterIndex[] bloomFilterIndices
+  ) throws IOException;
+
+  OrcProto.StripeFooter readStripeFooter(StripeInformation stripe) throws 
IOException;
+
+  /** Reads the data.
+   *
+   * Note that for the cases such as zero-copy read, caller must release the 
disk ranges
+   * produced after being done with them. Call isTrackingDiskRanges to find 
out if this is needed.
+   * @param range List if disk ranges to read. Ranges with data will be 
ignored.
+   * @param baseOffset Base offset from the start of the file of the ranges in 
disk range list.
+   * @param doForceDirect Whether the data should be read into direct buffers.
+   * @return New or modified list of DiskRange-s, where all the ranges are 
filled with data.
+   */
+  DiskRangeList readFileData(
+      DiskRangeList range, long baseOffset, boolean doForceDirect) throws 
IOException;
+
+
+  /**
+   * Whether the user should release buffers created by readFileData. See 
readFileData javadoc.
+   */
+  boolean isTrackingDiskRanges();
+
+  /**
+   * Releases buffers created by readFileData. See readFileData javadoc.
+   * @param toRelease The buffer to release.
+   */
+  void releaseBuffer(ByteBuffer toRelease);
+
+  /**
+   * Clone the entire state of the DataReader with the assumption that the
+   * clone will be closed at a different time. Thus, any file handles in the
+   * implementation need to be cloned.
+   * @return a new instance
+   */
+  LlapDataReader clone();
+
+  @Override
+  void close() throws IOException;
+
+  /**
+   * Returns the compression codec used by this datareader.
+   * We should consider removing this from the interface.
+   * @return the compression codec
+   */
+  CompressionCodec getCompressionCodec();

Review comment:
       This interface looks like a copy of ORC's DataReader except this method. 
ORC's DataReader returns as StreamOptions instead of CompressionCodec. As far 
as i understand, StreamOptions includes the compression codec and more info. 
Morevover, I see other parts of the code already make use of StreamOptions so 
it should be possible to integrate this part of the code too. I do not 
understand why LlapDataReader interface is necessary.

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/sarg/ConvertAstToSearchArg.java
##########
@@ -68,6 +70,10 @@
 
   private static final int KRYO_OUTPUT_BUFFER_SIZE = 4 * 1024;
   private static final int KRYO_OUTPUT_BUFFER_MAX_SIZE = 10 * 1024 * 1024;
+  private static final GregorianCalendar PROLEPTIC = new GregorianCalendar();

Review comment:
       This is not used anywhere?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 541348)
    Time Spent: 3h 40m  (was: 3.5h)

> Upgrade ORC version to 1.6.7
> ----------------------------
>
>                 Key: HIVE-23553
>                 URL: https://issues.apache.org/jira/browse/HIVE-23553
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

Reply via email to