[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13811.
-------------------------------------
    Resolution: Cannot Reproduce

No Vinod, this is nothing to do with HADOOP-13786 except the improved retry 
logic there may mean that transient problems go away.

I've actually got some insight on a possible cause of [~lminer]'s stack trace 
by way of Ryan Blue and the Netflix Experience.

# the V1 list API experience always returns 5000 entries (as set in 
{{fs.s3a.paging.maximum}}
# except for the final entry
# if you have versioning turned on in your bucket, deleted entries retain 
tombstone markers with references to their versions
# which will surface in the S3-side of list calls, but get stripped out from 
the response
# so...for a very large tree, you may end up S3 having to keep a channel open 
while is skips of thousands to millions of deleted objects before it can find 
actual ones to return.
# which can time out connections.

The v2 API apparently fixes this by returning smaller pages when needed. With 
the move to v2 by default (HADOOP-13421), this error may have gone away. 
Marking the issue as related to that and closing as Cannot-Reproduce

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13811
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13811
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to