[
https://issues.apache.org/jira/browse/HIVE-28963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kokila N updated HIVE-28963:
----------------------------
Description:
Insert Overwrite query with hive.acid.direct.insert.enabled=true writes data
directly to the target directory(actual table location).
{code:java}
private static Path[] getDirectInsertDirectoryCandidatesRecursive(FileSystem fs,
Path path, int skipLevels, PathFilter filter) throws IOException {
String lastRelDir = null;
HashSet<Path> results = new HashSet<Path>();
String relRoot = Path.getPathWithoutSchemeAndAuthority(path).toString();
if (!relRoot.endsWith(Path.SEPARATOR)) {
relRoot += Path.SEPARATOR;
}
RemoteIterator<LocatedFileStatus> allFiles = fs.listFiles(path, true);
while (allFiles.hasNext()) {
LocatedFileStatus lfs = allFiles.next();
.
.
}{code}
_*fs.listFiles(path, true)*_
- This recursively lists {*}all files{*}, even those that may be obsolete
or deleted during iteration. So if Hive's cleaner deletes {{base_0002484}}
_after_ it's discovered by {{listFiles()}} but _before_ {{hasNext()}} tries to
access it (for metadata resolution), we get a {{{}FileNotFoundException{}}}.
*Cleaner:*
There is no issue with the cleaner because it is deleting only the
files/directories that are marked as obsolete.
There is a fix for this issue in Hadoop upstream HADOOP-18662 which is not
present in HADOOP CDH 7.1.8 .
Also, I am considering that we can handle this issue from hive side as well.
So, I have created an upstream hive Jira to work to fix this.
HIVE-28963
This scenario is race condition but also should occur rarely as cleaner should
delete those directories in move stage of the insert between {{listFiles()and}}
{{{}hasNext(){}}}to trigger this issue.
> Handle FNF when ListFiles with recursive fails
> ----------------------------------------------
>
> Key: HIVE-28963
> URL: https://issues.apache.org/jira/browse/HIVE-28963
> Project: Hive
> Issue Type: Bug
> Reporter: Kokila N
> Assignee: Kokila N
> Priority: Major
>
> Insert Overwrite query with hive.acid.direct.insert.enabled=true writes data
> directly to the target directory(actual table location).
> {code:java}
> private static Path[] getDirectInsertDirectoryCandidatesRecursive(FileSystem
> fs,
> Path path, int skipLevels, PathFilter filter) throws IOException {
> String lastRelDir = null;
> HashSet<Path> results = new HashSet<Path>();
> String relRoot = Path.getPathWithoutSchemeAndAuthority(path).toString();
> if (!relRoot.endsWith(Path.SEPARATOR)) {
> relRoot += Path.SEPARATOR;
> }
> RemoteIterator<LocatedFileStatus> allFiles = fs.listFiles(path, true);
> while (allFiles.hasNext()) {
> LocatedFileStatus lfs = allFiles.next();
> .
> .
> }{code}
> _*fs.listFiles(path, true)*_
> - This recursively lists {*}all files{*}, even those that may be obsolete
> or deleted during iteration. So if Hive's cleaner deletes {{base_0002484}}
> _after_ it's discovered by {{listFiles()}} but _before_ {{hasNext()}} tries
> to access it (for metadata resolution), we get a
> {{{}FileNotFoundException{}}}.
> *Cleaner:*
> There is no issue with the cleaner because it is deleting only the
> files/directories that are marked as obsolete.
> There is a fix for this issue in Hadoop upstream HADOOP-18662 which is not
> present in HADOOP CDH 7.1.8 .
> Also, I am considering that we can handle this issue from hive side as well.
> So, I have created an upstream hive Jira to work to fix this.
> HIVE-28963
> This scenario is race condition but also should occur rarely as cleaner
> should delete those directories in move stage of the insert between
> {{listFiles()and}} {{{}hasNext(){}}}to trigger this issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)