----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70832/ -----------------------------------------------------------
Review request for hive, Ashutosh Chauhan, Gopal V, and Prasanth_J. Bugs: HIVE-21815 https://issues.apache.org/jira/browse/HIVE-21815 Repository: hive-git Description ------- Stats in ORC file are parsed twice ================================== ORC record reader unnecessarily parses stats twice ``` if (orcTail == null) { Reader orcReader = OrcFile.createReader(file.getPath(), OrcFile.readerOptions(context.conf) .filesystem(fs) .maxLength(AcidUtils.getLogicalLength(fs, file))); orcTail = new OrcTail(orcReader.getFileTail(), orcReader.getSerializedFileFooter(), file.getModificationTime()); if (context.cacheStripeDetails) { context.footerCache.put(new FooterCacheKey(fsFileId, file.getPath()), orcTail); } } stripes = orcTail.getStripes(); stripeStats = orcTail.getStripeStatistics(); ``` We go from Reader -> OrcTail -> StripeStatistics. stripeStats is read out of the orcTail and is already read inside orcReader.getStripeStatistics(). Diffs ----- ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 3878bba4d3 Diff: https://reviews.apache.org/r/70832/diff/1/ Testing ------- run TestInputOutputFormat tests. Thanks, Krisztian Kasa