[ 
https://issues.apache.org/jira/browse/HIVE-22936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Redis Liu updated HIVE-22936:
-----------------------------
    Description: 
h2. Symptom

I was running Hive over AWS S3 Inventory Report, which uses 
SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each s3 
file, like :
{code:java}
s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE
s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD{code}
When I have the following setting:
{code:java}
set hive.rework.mapredwork=true;  
{code}
The job fails with *NullPointException*, without stack trace.
h2. Cause

The content of symlink may be arbitrary full qualified FS path, while 
SymbolicInputFormat uses the default FS instance to get the status of the data 
files, which fails (and returns null) when the schema of data file differs from 
Hive's default FS.

Code point:
[https://github.com/apache/hive/blob/cfc12f05f0c034f9aad149960e58d40902e0dcfe/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java#L78]
{code:java}
              // "fileSystem" may not be able to list status for given file uri.
              FileStatus[] matches = fileSystem.globStatus(new 
Path(line));{code}
h2. Fix

Please check attached npe-symbolic-inputformat.patch

 

  was:
h2. Symptom

I was running Hive over AWS S3 Inventory Report, which uses 
SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each s3 
file, like :

 
{code:java}
s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE
s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD{code}
When I have the following setting:

 

 
{code:java}
set hive.rework.mapredwork=true;  
{code}
 

The job fails with *NullPointException*, without stack trace.
h2. Cause

The content of symlink may be arbitrary full qualified FS path, while 
SymbolicInputFormat uses the default FS instance to get the status of the data 
files, which fails (and returns null) when the schema of data file differs from 
Hive's default FS.

Code point: 
[https://github.com/apache/hive/blob/cfc12f05f0c034f9aad149960e58d40902e0dcfe/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java#L78]

 
{code:java}
              // "fileSystem" may not be able to list status for given file uri.
              FileStatus[] matches = fileSystem.globStatus(new 
Path(line));{code}
 
h2. Fix

Please check attached npe-symbolic-inputformat.patch

 


> NPE in SymbolicInputFormat
> --------------------------
>
>                 Key: HIVE-22936
>                 URL: https://issues.apache.org/jira/browse/HIVE-22936
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 3.1.2
>            Reporter: Redis Liu
>            Priority: Major
>         Attachments: npe-symbolic-inputformat.patch
>
>
> h2. Symptom
> I was running Hive over AWS S3 Inventory Report, which uses 
> SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each 
> s3 file, like :
> {code:java}
> s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE
> s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD{code}
> When I have the following setting:
> {code:java}
> set hive.rework.mapredwork=true;  
> {code}
> The job fails with *NullPointException*, without stack trace.
> h2. Cause
> The content of symlink may be arbitrary full qualified FS path, while 
> SymbolicInputFormat uses the default FS instance to get the status of the 
> data files, which fails (and returns null) when the schema of data file 
> differs from Hive's default FS.
> Code point:
> [https://github.com/apache/hive/blob/cfc12f05f0c034f9aad149960e58d40902e0dcfe/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java#L78]
> {code:java}
>               // "fileSystem" may not be able to list status for given file 
> uri.
>               FileStatus[] matches = fileSystem.globStatus(new 
> Path(line));{code}
> h2. Fix
> Please check attached npe-symbolic-inputformat.patch
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to