-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70832/
-----------------------------------------------------------

Review request for hive, Ashutosh Chauhan, Gopal V, and Prasanth_J.


Bugs: HIVE-21815
    https://issues.apache.org/jira/browse/HIVE-21815


Repository: hive-git


Description
-------

Stats in ORC file are parsed twice
==================================
ORC record reader unnecessarily parses stats twice

```
      if (orcTail == null) {
        Reader orcReader = OrcFile.createReader(file.getPath(),
            OrcFile.readerOptions(context.conf)
                .filesystem(fs)
                .maxLength(AcidUtils.getLogicalLength(fs, file)));
        orcTail = new OrcTail(orcReader.getFileTail(), 
orcReader.getSerializedFileFooter(),
            file.getModificationTime());
        if (context.cacheStripeDetails) {
          context.footerCache.put(new FooterCacheKey(fsFileId, file.getPath()), 
orcTail);
        }
      }
      stripes = orcTail.getStripes();
      stripeStats = orcTail.getStripeStatistics();
```

We go from Reader -> OrcTail -> StripeStatistics.

stripeStats is read out of the orcTail and is already read inside 
orcReader.getStripeStatistics().


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 3878bba4d3 


Diff: https://reviews.apache.org/r/70832/diff/1/


Testing
-------

run TestInputOutputFormat tests.


Thanks,

Krisztian Kasa

Reply via email to