On 2 Aug 2017, at 18:49, Ravi Prakash 
<ravihad...@gmail.com<mailto:ravihad...@gmail.com>> wrote:

Thanks for starting the discussion Steve.

This is a prickly issue and unfortunately we are hostages of past decisions. 
Thanks a lot for attacking the problem in the first place and sticking with it.

Fortunately, because few people do use logins this way no regressions surfaced 
before the release...now only a couple have.


In my experience we have found a lot of places that AWS secrets were logged for 
everyone to see

If there are places where this is still happening, I'd like to know -an email 
with the logs is fine. I know you can't stop it if you put the secrets inline, 
but if you do the secrets properly, they should never be visible. (which 
doesn't help diagnostics, not at all)


. I'm not sure allowing people to do that is the right thing to do in the 
long-term. We have to bite the bullet sometime.

If you set a credential provider list in the relevant fs.s3a option, you 
completely lose the ability to work off secrets-in-URLs: this actually allows 
people to turn off the feature entirely.

Should we do that for Hadoop 3? I'd prefer it. It's key rationalisation was 
"lets me work with different credentials at the same time", but per-bucket 
configs do that.

If we agree to cutting support completely in 3.0, we could change the message 
in 2.8.2+ to "this will go away" and include a wiki link to how to prepare for 
this

Perhaps we should do that in trunk (3.0.0)? To unbreak clients of Hadoop-2.x we 
can go with Vinayakumar's proposal but only in branch-2. Ofcourse technically 
we have hadoop-2.8.0 already out with this, but I agree we can put the fix in 
2.8.2.

$0.02
Ravi

On Wed, Aug 2, 2017 at 5:52 AM, Steve Loughran 
<ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote:


HADOOP-3733<https://issues.apache.org/jira/browse/HADOOP-3733> stripped out the 
user:password secret from the s3., s3a, s3n URLs for security grounds: 
everything logged Path entries without ever considering that they contained 
secret credentials.

but that turns out to break things, as noted in HADOOP-14439  ...you can't any 
more go Path -> String -> Path without authentication details being lost, and 
of course, guess how paths are often marshalled around? As strings (after all, 
they weren't serializable until recently)

Vinayakumar has proposed a patch reinstating retaining the secrets, at least 
enough for distcp

https://issues.apache.org/jira/browse/HADOOP-3733?focusedCommentId=16110297&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16110297

I think I'm going to go with this, once I get the tests & testing to go with, 
and if its enough to work with spark too .. targeting 2.8.2 if its not too late.

If there's a risk, it's that if someone puts secrets into s3 URIs, the secrets 
are more likely to be logged. But even with the current code, there's no way to 
guarantee that the secrets will never be logged. The danger comes from having 
id:secret credentials in the URI —something people will be told off for doing.




Reply via email to