[ https://issues.apache.org/jira/browse/HIVE-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568707#comment-16568707 ]
Gopal V commented on HIVE-19985: -------------------------------- Tested with/without flag=true, with {{select count(1), sum(ss_net_profit) from store_sales;}}. with: 31.383 seconds (cold run), 15.278 seconds (hot run) without: 35.525 seconds (cold run), 22.934 seconds (hot run) Latest patch has a big impact on the cached runs. I can see there's L1 cache miss hotspot in the double System.arrayCopy to make AcidWrapper and then to copy it back in copyBase(). {code} if (isAcidScan) { + int acidColCount = acidReader.includeAcidColumns() ? OrcInputFormat.getRootColumn(false) - 1 : 0; ... + int ixInVrb = includes.getPhysicalColumnIds().get(ixInReadSet) - + (acidReader.includeAcidColumns() ? 0 : OrcRecordUpdater.ROW); {code} Can that be changed to if(isAcidScan && innerReader.includeAcidColumns()) to skip that entirely, because the offsets fall back the same way to the non-acid impl? > ACID: Skip decoding the ROW__ID sections for read-only queries > --------------------------------------------------------------- > > Key: HIVE-19985 > URL: https://issues.apache.org/jira/browse/HIVE-19985 > Project: Hive > Issue Type: Improvement > Components: Transactions > Reporter: Gopal V > Assignee: Eugene Koifman > Priority: Major > Labels: Branch3Candidate > Attachments: HIVE-19985.01.patch, HIVE-19985.04.patch > > > For a base_n file there are no aborted transactions within the file and if > there are no pending delete deltas, the entire ACID ROW__ID can be skipped > for all read-only queries (i.e SELECT), though it still needs to be projected > out for MERGE, UPDATE and DELETE queries. > This patch tries to entirely ignore the ACID ROW__ID fields for all tables > where there are no possible deletes or aborted transactions for an ACID split. -- This message was sent by Atlassian JIRA (v7.6.3#76005)