On Mon, Sep 13, 2010 at 10:59 AM, Owen O'Malley <omal...@apache.org> wrote:

> On Mon, Sep 13, 2010 at 10:05 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> > This is not MR-specific, since the strangely named hadoop.job.ugi
> determines
> > HDFS permissions as well.
>
> Yeah, after I hit send, I realized that I should have used common-dev.
> This is really a dev issue.
>
> > "or the user must write a custom group mapper" above refers to this
> plugin
> > capability. But I think most users do not want to spend the time to write
> > (or even setup) such a plugin beyond the default shell-based mapping
> > service.
>
> Sure, which is why it is easiest to just have the (hopefully disabled)
> user accounts on the jt/nn. Any installs > 100 nodes should be using
> HADOOP-6864 to avoid the fork in the JT/NN.
>

Yep, but there are plenty of 10 node clusters out there that do important
work at small startups or single-use-case installations, too. We need to
provide scalability and security features that work for the 100+ node
clusters but also not leave the beginners in the dust.


>
> > As someone who spends an awful lot of time doing downstream support of
> lots
> > of different clusters, I actually disagree.
>
> Normal applications never need to do doAs. They run as the default
> user. This only comes up in servers that deal with multiple users. In
> *that* context, it sucks having servers that only work in non-secure
> mode. If some server X only works without security that sucks. Doing
> doAs isn't harder, it is just different. Having two different
> semantics models *will* cause lots of grief.
>

I agree that all real (ie community) projects should support both security
and non-security and shouldn't be using hadoop.job.ugi to impersonate users.
But I think there are plenty of people out there who have built small
webapps, shell scripts, cron jobs, etc that use hadoop.job.ugi on some
shared account to impersonate other users. Perhaps I am estimating
incorrectly - that's why I wanted this discussion on a user-facing list
rather than a dev-facing list.

Another example use case that I do a lot on non-secure clusters is: hadoop
fs -Dhadoop.job.ugi=hadoop,hadoop <something I want to do as a superuser>.
The permissions model we have in 0.20 obviously isn't secure, but it's nice
to avoid accidental mistakes, and making it easy to "sudo" like that is
handy.

Regardless of our particular opinions, isn't our policy that we cannot break
API compatibility between versions without a one-version deprecation period?
I see this as an important API (even if it isn't one we like) and breaking
it without such a transition period is against our own rules. Like you said,
doAs() isn't any harder, but we need to give people a grace period to switch
over, and we probably need to write some command line tools to allow fs
operations as superuser, etc.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to