[ https://issues.apache.org/jira/browse/HIVE-24021?focusedWorklogId=468528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468528 ]
ASF GitHub Bot logged work on HIVE-24021: ----------------------------------------- Author: ASF GitHub Bot Created on: 10/Aug/20 11:49 Start Date: 10/Aug/20 11:49 Worklog Time Spent: 10m Work Description: klcopp opened a new pull request #1384: URL: https://github.com/apache/hive/pull/1384 This is the second PR for this change. ### Why are the changes needed? Hive reads insert-only tables truncated by Impala as if the truncation hadn't happened. ### Does this PR introduce _any_ user-facing change? Yes, getAcidState will not filter out files with names beginning with "_empty" ### How was this patch tested? Unit test ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 468528) Time Spent: 0.5h (was: 20m) > Read insert-only tables truncated by Impala correctly > ----------------------------------------------------- > > Key: HIVE-24021 > URL: https://issues.apache.org/jira/browse/HIVE-24021 > Project: Hive > Issue Type: Bug > Reporter: Karen Coppage > Assignee: Karen Coppage > Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Impala truncates insert-only tables by writing a base directory containing an > empty file named "_empty". (Like Hive should, see HIVE-20137) Generally in > Hive a file name beginning with an underscore connotes a temporary file that > isn't supposed to be read by operations that didn't create it. > Before HIVE-23495, getAcidState listed each directory in the table > (HdfsUtils#listLocatedStatus) – and filtered out directories with names > beginning with an underscore or period as they are presumably temporary. This > allowed files called "_empty" to be read, since hive checked the directory > name and not the file name. > After HIVE-23495, we recursively list each file in the table > (AcidUtils#getHdfsDirSnapshots) with a filter that doesn't accept files with > names beginning with an underscore or period as they are presumably > temporary. As a result Hive reads the table data as if the truncate operation > had not happened. > Since performance in getAcidState is important, probably the best solution is > make an exception in the filter and accept files with the name "_empty". -- This message was sent by Atlassian Jira (v8.3.4#803005)