[ https://issues.apache.org/jira/browse/HIVE-21815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856150#comment-16856150 ]
Gopal V commented on HIVE-21815: -------------------------------- We need to change that to look like (approx code) {code} if (orcTail == null) { Reader orcReader = OrcFile.createReader(file.getPath(), OrcFile.readerOptions(context.conf) .filesystem(fs) .maxLength(AcidUtils.getLogicalLength(fs, file))); orcTail = new OrcTail(orcReader.getFileTail(), orcReader.getSerializedFileFooter(), file.getModificationTime()); if (context.cacheStripeDetails) { context.footerCache.put(new FooterCacheKey(fsFileId, file.getPath()), orcTail); } stripes = orcReader.getStripes(); stripeStats = orcReader.getStripeStatistics(); } else { stripes = orcTail.getStripes(); stripeStats = orcTail.getStripeStatistics(); } {code} > Stats in ORC file are parsed twice > ---------------------------------- > > Key: HIVE-21815 > URL: https://issues.apache.org/jira/browse/HIVE-21815 > Project: Hive > Issue Type: Improvement > Components: ORC > Reporter: Gopal V > Priority: Major > Attachments: orc-tail-getproto.png, tez-am-2x-protobuf.svg > > > ORC record reader unnecessarily parses stats twice > {code} > if (orcTail == null) { > Reader orcReader = OrcFile.createReader(file.getPath(), > OrcFile.readerOptions(context.conf) > .filesystem(fs) > .maxLength(AcidUtils.getLogicalLength(fs, file))); > orcTail = new OrcTail(orcReader.getFileTail(), > orcReader.getSerializedFileFooter(), > file.getModificationTime()); > if (context.cacheStripeDetails) { > context.footerCache.put(new FooterCacheKey(fsFileId, > file.getPath()), orcTail); > } > } > stripes = orcTail.getStripes(); > stripeStats = orcTail.getStripeStatistics(); > {code} > We go from Reader -> OrcTail -> StripeStatistics. > stripeStats is read out of the orcTail and is already read inside > orcReader.getStripeStatistics(). > !orc-tail-getproto.png! > [^tez-am-2x-protobuf.svg] -- This message was sent by Atlassian JIRA (v7.6.3#76005)