[ https://issues.apache.org/jira/browse/HIVE-21815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862639#comment-16862639 ]
Gopal V commented on HIVE-21815: -------------------------------- LGTM - +1 > Stats in ORC file are parsed twice > ---------------------------------- > > Key: HIVE-21815 > URL: https://issues.apache.org/jira/browse/HIVE-21815 > Project: Hive > Issue Type: Improvement > Components: ORC > Reporter: Gopal V > Assignee: Krisztian Kasa > Priority: Major > Attachments: HIVE-21815.1.patch, HIVE-21815.1.patch, > HIVE-21815.2.patch, orc-tail-getproto.png, tez-am-2x-protobuf.svg > > > ORC record reader unnecessarily parses stats twice > {code} > if (orcTail == null) { > Reader orcReader = OrcFile.createReader(file.getPath(), > OrcFile.readerOptions(context.conf) > .filesystem(fs) > .maxLength(AcidUtils.getLogicalLength(fs, file))); > orcTail = new OrcTail(orcReader.getFileTail(), > orcReader.getSerializedFileFooter(), > file.getModificationTime()); > if (context.cacheStripeDetails) { > context.footerCache.put(new FooterCacheKey(fsFileId, > file.getPath()), orcTail); > } > } > stripes = orcTail.getStripes(); > stripeStats = orcTail.getStripeStatistics(); > {code} > We go from Reader -> OrcTail -> StripeStatistics. > stripeStats is read out of the orcTail and is already read inside > orcReader.getStripeStatistics(). > !orc-tail-getproto.png! > [^tez-am-2x-protobuf.svg] -- This message was sent by Atlassian JIRA (v7.6.3#76005)