[ 
https://issues.apache.org/jira/browse/HIVE-28963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kokila N updated HIVE-28963:
----------------------------
    Description: 
Insert Overwrite query with hive.acid.direct.insert.enabled=true writes data 
directly to the target directory(actual table location). 
{code:java}
private static Path[] getDirectInsertDirectoryCandidatesRecursive(FileSystem fs,
    Path path, int skipLevels, PathFilter filter) throws IOException {
  String lastRelDir = null;
  HashSet<Path> results = new HashSet<Path>();
  String relRoot = Path.getPathWithoutSchemeAndAuthority(path).toString();
  if (!relRoot.endsWith(Path.SEPARATOR)) {
    relRoot += Path.SEPARATOR;
  }
  RemoteIterator<LocatedFileStatus> allFiles = fs.listFiles(path, true);
  while (allFiles.hasNext()) {
  LocatedFileStatus lfs = allFiles.next();
  .
  .
}{code}
_*fs.listFiles(path, true)*_
    - This recursively lists {*}all files{*}, even those that may be obsolete 
or deleted during iteration. So if Hive's cleaner deletes {{base_0002484}} 
_after_ it's discovered by {{listFiles()}} but _before_ {{hasNext()}} tries to 
access it (for metadata resolution), we get a {{{}FileNotFoundException{}}}.

*Cleaner:*
  There is no issue with the cleaner because it is deleting only the 
files/directories that are marked as obsolete. 

There is a fix for this issue in Hadoop upstream HADOOP-18662 which is not 
present in HADOOP CDH 7.1.8 . 

Also, I am considering that we can handle this issue from hive side as well. 
So, I have created an upstream hive Jira to work to fix this.

HIVE-28963 

This scenario is race condition but also should occur rarely as cleaner should 
delete those directories in move stage of the insert between {{listFiles()and}} 
{{{}hasNext(){}}}to trigger this issue. 

> Handle FNF when ListFiles with recursive fails
> ----------------------------------------------
>
>                 Key: HIVE-28963
>                 URL: https://issues.apache.org/jira/browse/HIVE-28963
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kokila N
>            Assignee: Kokila N
>            Priority: Major
>
> Insert Overwrite query with hive.acid.direct.insert.enabled=true writes data 
> directly to the target directory(actual table location). 
> {code:java}
> private static Path[] getDirectInsertDirectoryCandidatesRecursive(FileSystem 
> fs,
>     Path path, int skipLevels, PathFilter filter) throws IOException {
>   String lastRelDir = null;
>   HashSet<Path> results = new HashSet<Path>();
>   String relRoot = Path.getPathWithoutSchemeAndAuthority(path).toString();
>   if (!relRoot.endsWith(Path.SEPARATOR)) {
>     relRoot += Path.SEPARATOR;
>   }
>   RemoteIterator<LocatedFileStatus> allFiles = fs.listFiles(path, true);
>   while (allFiles.hasNext()) {
>   LocatedFileStatus lfs = allFiles.next();
>   .
>   .
> }{code}
> _*fs.listFiles(path, true)*_
>     - This recursively lists {*}all files{*}, even those that may be obsolete 
> or deleted during iteration. So if Hive's cleaner deletes {{base_0002484}} 
> _after_ it's discovered by {{listFiles()}} but _before_ {{hasNext()}} tries 
> to access it (for metadata resolution), we get a 
> {{{}FileNotFoundException{}}}.
> *Cleaner:*
>   There is no issue with the cleaner because it is deleting only the 
> files/directories that are marked as obsolete. 
> There is a fix for this issue in Hadoop upstream HADOOP-18662 which is not 
> present in HADOOP CDH 7.1.8 . 
> Also, I am considering that we can handle this issue from hive side as well. 
> So, I have created an upstream hive Jira to work to fix this.
> HIVE-28963 
> This scenario is race condition but also should occur rarely as cleaner 
> should delete those directories in move stage of the insert between 
> {{listFiles()and}} {{{}hasNext(){}}}to trigger this issue. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to