On 2 Aug 2017, at 18:49, Ravi Prakash <ravihad...@gmail.com<mailto:ravihad...@gmail.com>> wrote:
Thanks for starting the discussion Steve. This is a prickly issue and unfortunately we are hostages of past decisions. Thanks a lot for attacking the problem in the first place and sticking with it. Fortunately, because few people do use logins this way no regressions surfaced before the release...now only a couple have. In my experience we have found a lot of places that AWS secrets were logged for everyone to see If there are places where this is still happening, I'd like to know -an email with the logs is fine. I know you can't stop it if you put the secrets inline, but if you do the secrets properly, they should never be visible. (which doesn't help diagnostics, not at all) . I'm not sure allowing people to do that is the right thing to do in the long-term. We have to bite the bullet sometime. If you set a credential provider list in the relevant fs.s3a option, you completely lose the ability to work off secrets-in-URLs: this actually allows people to turn off the feature entirely. Should we do that for Hadoop 3? I'd prefer it. It's key rationalisation was "lets me work with different credentials at the same time", but per-bucket configs do that. If we agree to cutting support completely in 3.0, we could change the message in 2.8.2+ to "this will go away" and include a wiki link to how to prepare for this Perhaps we should do that in trunk (3.0.0)? To unbreak clients of Hadoop-2.x we can go with Vinayakumar's proposal but only in branch-2. Ofcourse technically we have hadoop-2.8.0 already out with this, but I agree we can put the fix in 2.8.2. $0.02 Ravi On Wed, Aug 2, 2017 at 5:52 AM, Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote: HADOOP-3733<https://issues.apache.org/jira/browse/HADOOP-3733> stripped out the user:password secret from the s3., s3a, s3n URLs for security grounds: everything logged Path entries without ever considering that they contained secret credentials. but that turns out to break things, as noted in HADOOP-14439 ...you can't any more go Path -> String -> Path without authentication details being lost, and of course, guess how paths are often marshalled around? As strings (after all, they weren't serializable until recently) Vinayakumar has proposed a patch reinstating retaining the secrets, at least enough for distcp https://issues.apache.org/jira/browse/HADOOP-3733?focusedCommentId=16110297&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16110297 I think I'm going to go with this, once I get the tests & testing to go with, and if its enough to work with spark too .. targeting 2.8.2 if its not too late. If there's a risk, it's that if someone puts secrets into s3 URIs, the secrets are more likely to be logged. But even with the current code, there's no way to guarantee that the secrets will never be logged. The danger comes from having id:secret credentials in the URI —something people will be told off for doing.