[ 
https://issues.apache.org/jira/browse/HADOOP-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386993#comment-17386993
 ] 

Bobby Wang commented on HADOOP-17812:
-------------------------------------

Hi [~ste...@apache.org]

Thx for your comments on the PR and the JIRA, I modified the unit test to 
include the repro step described in this JIRA, and the unit tests all passed.

 

I followed the step to run the Integration Test on our internal s3 storage and 
seems some tests failed. And I was told some failed tests were expected, So I 
just uploaded the failsafe-report.html in the attachment named with 
s3a-test.tar.gz, Could you help to check if the failed test cases are expected?

 

BTW, I just configured the below items in the auth-keys.xml, 

 
{quote}<configuration>

<property>
 <name>test.fs.s3a.name</name>
 <value>s3a://testawss3a/</value>
 </property>

<property>
 <name>fs.contract.test.fs.s3a</name>
 <value>${test.fs.s3a.name}</value>
 </property>

<property>
 <name>fs.s3a.access.key</name>
 <value>XXX</value>
 </property>

<property>
 <name>fs.s3a.secret.key</name>
 <value>XXXXXX</value>
 </property>

<property>
 <name>fs.s3a.endpoint</name>
 <value>XXXXX</value>
 </property>
 
 <property>
 <name>fs.s3a.path.style.access</name>
 <value>true</value>
 </property>

</configuration>
{quote}

> NPE in S3AInputStream read() after failure to reconnect to store
> ----------------------------------------------------------------
>
>                 Key: HADOOP-17812
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17812
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.2, 3.3.1
>            Reporter: Bobby Wang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: s3a-test.tar.gz
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> when [reading from S3a 
> storage|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L450],
>  SSLException (which extends IOException) happens, which will trigger 
> [onReadFailure|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L458].
> onReadFailure calls "reopen". it will first close the original 
> *wrappedStream* and set *wrappedStream = null*, and then it will try to 
> [re-get 
> *wrappedStream*|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L184].
>  But what if the previous code [obtaining 
> S3Object|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L183]
>  throw exception, then "wrappedStream" will be null.
> And the 
> [retry|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L446]
>  mechanism may re-execute the 
> [wrappedStream.read|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L450]
>  and cause NPE.
>  
> For more details, please refer to 
> [https://github.com/NVIDIA/spark-rapids/issues/2915]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to