Larry, and all

Apologize for not responding sooner. I read your proposals and think about how 
to collaborate well and speed up things for all of us. From community 
discussions around the Hadoop Summit, TokenAuth should be a pluggable full 
stack to accommodate different implementations. HADOOP-9392 reflects that 
thinking and came up with the breakdown attached in the JIRA. To simplify the 
discussion I would try to illustrate it here in very high level as follows.

Simply we would have:
TokenAuth = TokenAuth framework + TokenAuth implementation (HAS) + TokenAuth 
integration

= TokenAuth framework =
It first defines TokenAuth as the desired pluggable framework that defines and 
provides required APIs, protocols, flows, and facilities along with common 
implementations for related constructs, entities and even services. The 
framework is a subject for continued discussion and defined together as a 
common effort of the community. It's important that the framework be pluggable 
in all the key places to allow certain solutions to employ their own product 
level implementations. Based on this framework, we could build the HAS 
implementation. Initially, we have the following items to think about to define 
relevant API and provide core facilities for the framework and the list is to 
be complemented.
1. Common token definition;
2. TokenAuthn method for Hadoop RPC;
3. Authentication Service;
4. Identity Token Service;
5. Access Token Service;
6. Fine grained authorization;
7. Attribute Service;
8. Token authentication client;
9. Token cache;
10. Common configuration across TokenAuth;
11. Hadoop token command;
12. Key Provider;
13. Web SSO support;
14. REST SSO support;
15. Auditing support.

= TokenAuth implementation (HAS) =
This defines and implements Hadoop AuthN/AuthZ Server (HAS) based on TokenAuth 
framework. HAS is a centralized server to address AAA (Authentication, 
Authorization, Auditing) concerns for Hadoop across the ecosystem. The 'A' of 
HAS could stand for "Authentication", "Authorization", or "Auditing", depending 
on which role(s) HAS is provisioned with. HAS is a complete and enterprise 
ready security solution based on TokenAuth framework and utilizes the common 
facilities provided by the framework. It customizes and provides all the 
necessary implementations of constructs, entities, and services defined in the 
framework that's required by enterprise deployment. Initially we have the 
following for the implementation:
1. Provide common and management facilities including configuration 
loading/syncing mechanism, auditing and logging support, shared high 
availability approach, REST support and so on;
2. Implement Authentication Server role for HAS, implementing Authentication 
Service, and Identity Token Service defined in the framework. The 
authentication engine can be configured with a chain of authentication modules 
to support multi-factor authentication. Particularly, it will support LDAP 
authentication;
3. Implement Authorization Server role for HAS, implementing Access Token 
Service;
4. Implement centralized administration for fine-grained authorization for 
Authorization Server role. Optional in initial iteration;
5. Implement Attribute Service for HAS, to allow integration of third party 
attribute authorities. Optional in initial iteration.
6. Provides authorization enforcement library for Hadoop services to enforce 
security policies utilizing related services provided by the Authorization 
Server. Optional in initial iteration.

= TokenAuth integration =
This includes tasks that employ TokenAuth framework and relevant 
implementation(s) to enable related supports for various Hadoop components 
across the ecosystem for typical enterprise deployments. Currently we have the 
following in mind:
1. Enable Web SSO flow for web interfaces like HDFS and YARN;
2. Enable REST SSO flow for REST interface like Oozie;
3. Add Thrift and Hive JDBC support using TokenAuth. We consider this support 
because it is an important interface for enterprise to interact with data;
4. Enable to access Zookeeper using TokenAuth since it's widely used as the 
coordinator across the ecosystem.

I regard decouple of the pluggable framework from specific implementation as 
important since we're addressing the similar requirements on the other hand we 
have different implementation considerations in approaches like the ones 
represented by HADOOP-9392 and HADOOP-9533. For example, to support pluggable 
authentication HADOOP-9392 prefers to JAAS based authentication modules but 
HADOOP-9533 suggests using Apache Shiro. By this decouple we could best 
collaborate and contribute, as far as I understood, you might agree with this 
approach as can be seen in your recent email, "decouple the pluggable framework 
from any specific central server implementation". If I understood you 
correctly, do you think for the initial iteration we have to have two central 
servers like HAS server and HSSO server? If not, do you think it works for us 
to have HAS as a community effort as the TokenAuth framework and we both 
contribute on the implementation?

To proceed, I would try to align between us, complementing your proposal and 
addressing your concerns as follows.

= Iteration Endstate =
Besides what you mentioned from user view, how about adding this consideration:
Additionally, the initial iteration would also lay down the ground TokenAuth 
framework with fine defined APIs, protocols, flows and core facilities for 
implementations. The framework should avoid rework and big change for future 
implementations.

= Terminology and Naming =
It would be great if we can unify the related terminologies in this effort, at 
least in the framework level. This could be probably achieved in the process of 
defining relevant APIs for the TokenAuth framework.

= Project scope =
It's great we have the common list in scope for the first iteration as you 
mentioned as follows:
Usecases:
client types: REST, CLI, UI
authentication types: Simple, Kerberos, authentication/LDAP, federation/SAML

We might also consider OAuth 2.0 support. Anyway please note by defining this 
in-scope list we know what's required as must-have in the iteration as 
enforcement of our consensus, however it should not limit any relevant parties 
to contribute more meanwhile unless it does not be appropriate at the time.

= Branch =
As you mentioned we may have different branches for different features 
considering merge.  Another approach is just having one branch with relevant 
security features, the review and merge work can still be JIRA based.

1. Based on your proposal, how about the following as the branch(es) scope:
1)  Pluggable Authentication and Token based SSO
2)  CryptoFS for volume level encryption (HCFS)
3) Pluggable UGI change
4) Key management system
5) Unified authorization

2. With the above scope in mind, a candidate branch name could be like 
'security-branch' instead of 'tokenauth-branch'. How about creating the branch 
now if we don't have other concerns?

3. Check-in philosophy. Agree with your proposal with slightly concerns:
In terms of check-in philosophy, we should take a review then check-in approach 
to the branch with lazy consensus - wherein we do not need to explicitly +1 
every check-in to the branch but we will honor any -1's with discussion to 
resolve before checking in. This will provide us each with the opportunity to 
track the work being done and ensure that we understand it and find that it 
meets the intended goals.

We might need explicit +1 otherwise we would need define a time window pending 
to wait when to check-in.
One issue we would like to clarify, does voting also include the security 
branch committers.

= JIRA =
We might not need additional umbrella JIRA for now since we already have 
HADOOP-9392 and HADOOP-9533. By the way I would suggest we use existing feature 
JIRAs to discuss relevant and specific issues on the going. Leveraging these 
JIRAs we might avoid too much details in the common-dev thread and it's also 
easy to track relevant discussions.

I agree it's a good point to start with an inventory of the existing JIRAs. We 
can do that if there're no other concerns. We would provide the full list of 
breakdown JIRAs and attach it in HADOOP-9392 then for further collaboration.

Regards,
Kai

From: larry mccay [mailto:larry.mc...@gmail.com]
Sent: Wednesday, September 18, 2013 6:27 AM
To: Zheng, Kai; Chen, Haifeng; common-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Security Efforts and Branching

All -

I apologize for not following up sooner. I have been heads down on some other 
matters that required my attention.

It seems that it may be easier to move forward by gaining consensus a little 
bit at a time rather than trying to hit the ground running where the other 
thread left off.

Would it be agreeable to everyone to start with an inventory of the existing 
Jiras that have patches available or nearly available so that we can determine 
what concrete bits we have to start with?

Once we get that done, we can try and frame a set of goals to to make up the 
initial iteration and determine what from the inventory will be leverage in 
that iteration.

Does this sound reasonable to everyone?
Would anyone like to propose another starting point?

thanks,

--larry

On Wed, Sep 4, 2013 at 4:26 PM, larry mccay 
<larry.mc...@gmail.com<mailto:larry.mc...@gmail.com>> wrote:
It doesn't look like the PDF made it all the way through to the archives and 
maybe even to recipients - so the following is the text version of the 
iteration-1 draft:

Iteration 1: Pluggable User Authentication and Federation

Introduction
The intent of this effort is to bootstrap the development of pluggable 
token-based authentication mechanisms to support certain goals of enterprise 
authentication integrations. By restricting the scope of this effort, we hope 
to provide immediate benefit to the community while keeping the initial 
contribution to a manageable size that can be easily reviewed, understood and 
extended with further development through follow up JIRAs and related 
iterations.

Iteration Endstate
Once complete, this effort will have extended the authentication mechanisms - 
for all client types - from the existing: Simple, Kerberos and Plain (for RPC) 
to include LDAP authentication and SAML based federation. In addition, the 
ability to provide additional/custom authentication mechanisms will be enabled 
for users to plug in their preferred mechanisms.

Project Scope
The scope of this effort is a subset of the features covered by the overviews 
of HADOOP-9392 and HADOOP-9533. This effort concentrates on enabling Hadoop to 
issue, accept/validate SSO tokens of its own. The pluggable authentication 
mechanism within SASL/RPC layer and the authentication filter pluggability for 
REST and UI components will be leveraged and extended to support the results of 
this effort.

Out of Scope
In order to scope the initial deliverable as the minimally viable product, a 
handful of things have been simplified or left out of scope for this effort. 
This is not meant to say that these aspects are not useful or not needed but 
that they are not necessary for this iteration. We do however need to ensure 
that we don't do anything to preclude adding them in future iterations.
1. Additional Attributes - the result of authentication will continue to use 
the existing hadoop tokens and identity representations. Additional attributes 
used for finer grained authorization decisions will be added through follow-up 
efforts.
2. Token revocation - the ability to revoke issued identity tokens will be 
added later
3. Multi-factor authentication - this will likely require additional attributes 
and is not necessary for this iteration.
4. Authorization changes - we will require additional attributes for the 
fine-grained access control plans. This is not needed for this iteration.
5. Domains - we assume a single flat domain for all users
6. Kinit alternative - we can leverage existing REST clients such as cURL to 
retrieve tokens through authentication and federation for the time being
7. A specific authentication framework isn't really necessary within the REST 
endpoints for this iteration. If one is available then we can use it otherwise 
we can leverage existing things like Apache Shiro within a servlet filter.

In Scope
What is in scope for this effort is defined by the usecases described below. 
Components required for supporting the usecases are summarized for each client 
type. Each component is a candidate for a JIRA subtask - though multiple 
components are likely to be included in a JIRA to represent a set of 
functionality rather than individual JIRAs per component.

Terminology and Naming
The terms and names of components within this document are merely descriptive 
of the functionality that they represent. Any similarity or difference in names 
or terms from those that are found in other documents are not intended to make 
any statement about those other documents or the descriptions within. This 
document represents the pluggable authentication mechanisms and server 
functionality required to replace Kerberos.

Ultimately, the naming of the implementation classes will be a product of the 
patches accepted by the community.

Usecases:
client types: REST, CLI, UI
authentication types: Simple, Kerberos, authentication/LDAP, federation/SAML

Simple and Kerberos
Simple and Kerberos usecases continue to work as they do today. The addition of 
Authentication/LDAP and Federation/SAML are added through the existing 
pluggability points either as they are or with required extension. Either way, 
continued support for Simple and Kerberos must not require changes to existing 
deployments in the field as a result of this effort.

REST
USECASE REST-1 Authentication/LDAP:
For REST clients, we will provide the ability to:
1. use cURL to Authenticate via LDAP through an IdP endpoint exposed by an 
AuthenticationServer instance via REST calls to:
   a. authenticate - passing username/password returning a hadoop id_token
   b. get-access-token - from the TokenGrantingService by passing the hadoop 
id_token as an Authorization: Bearer token along with the desired service name 
(master service name) returning a hadoop access token
2. Successfully invoke a hadoop service REST API passing the hadoop access 
token through an HTTP header as an Authorization Bearer token
   a. validation of the incoming token on the service endpoint is accomplished 
by an SSOAuthenticationHandler
3. Successfully block access to a REST resource when presenting a hadoop access 
token intended for a different service
   a. validation of the incoming token on the service endpoint is accomplished 
by an SSOAuthenticationHandler

USECASE REST-2 Federation/SAML:
We will also provide federation capabilities for REST clients such that:
1. acquire SAML assertion token from a trusted IdP (shibboleth?) and persist in 
a permissions protected file - ie. ~/.hadoop_tokens/.idp_token
2. use cURL to Federate a token from a trusted IdP through an SP endpoint 
exposed by an AuthenticationServer(FederationServer?) instance via REST calls 
to:
   a. federate - passing a SAML assertion as an Authorization: Bearer token 
returning a hadoop id_token
      - can copy and paste from commandline or use cat to include persisted 
token through "--Header Authorization: Bearer 'cat ~/.hadoop_tokens/.id_token'"
   b. get-access-token - from the TokenGrantingService by passing the hadoop 
id_token as an Authorization: Bearer token along with the desired service name 
(master service name) to the TokenGrantingService returning a hadoop access 
token
3. Successfully invoke a hadoop service REST API passing the hadoop access 
token through an HTTP header as an Authorization Bearer token
   a. validation of the incoming token on the service endpoint is accomplished 
by an SSOAuthenticationHandler
4. Successfully block access to a REST resource when presenting a hadoop access 
token intended for a different service
   a. validation of the incoming token on the service endpoint is accomplished 
by an SSOAuthenticationHandler
REQUIRED COMPONENTS for REST USECASES:
COMP-1. REST client - cURL or similar
COMP-2. REST endpoint for BASIC authentication to LDAP - IdP endpoint example - 
returning hadoop id_token
COMP-3. REST endpoint for federation with SAML Bearer token - shibboleth 
SP?|OpenSAML? - returning hadoop id_token
COMP-4. REST TokenGrantingServer endpoint for acquiring hadoop access tokens 
from hadoop id_tokens
COMP-5. SSOAuthenticationHandler to validate incoming hadoop access tokens
COMP-6. some source of a SAML assertion - shibboleth IdP?
COMP-7. hadoop token and authority implementations
COMP-8. core services for crypto support for signing, verifying and PKI 
management

CLI
USECASE CLI-1 Authentication/LDAP:
For CLI/RPC clients, we will provide the ability to:
1. use cURL to Authenticate via LDAP through an IdP endpoint exposed by an 
AuthenticationServer instance via REST calls to:
   a. authenticate - passing username/password returning a hadoop id_token
      - for RPC clients we need to persist the returned hadoop identity token 
in a file protected by fs permissions so that it may be leveraged until expiry
      - directing the returned response to a file may suffice for now something 
like ">~/.hadoop_tokens/.id_token"
2. use hadoop CLI to invoke RPC API on a specific hadoop service
   a. RPC client negotiates a TokenAuth method through SASL layer, hadoop 
id_token is retrieved from ~/.hadoop_tokens/.id_token is passed as 
Authorization: Bearer token to the get-access-token REST endpoint exposed by 
TokenGrantingService returning a hadoop access token
   b. RPC server side validates the presented hadoop access token and continues 
to serve request
   c. Successfully invoke a hadoop service RPC API

USECASE CLI-2 Federation/SAML:
For CLI/RPC clients, we will provide the ability to:
1. acquire SAML assertion token from a trusted IdP (shibboleth?) and persist in 
a permissions protected file - ie. ~/.hadoop_tokens/.idp_token
2. use cURL to Federate a token from a trusted IdP through an SP endpoint 
exposed by an AuthenticationServer(FederationServer?) instance via REST calls 
to:
   a. federate - passing a SAML assertion as an Authorization: Bearer token 
returning a hadoop id_token
      - can copy and paste from commandline or use cat to include previously 
persisted token through "--Header Authorization: Bearer 'cat 
~/.hadoop_tokens/.id_token'"
3. use hadoop CLI to invoke RPC API on a specific hadoop service
   a. RPC client negotiates a TokenAuth method through SASL layer, hadoop 
id_token is retrieved from ~/.hadoop_tokens/.id_token is passed as 
Authorization: Bearer token to the get-access-token REST endpoint exposed by 
TokenGrantingService returning a hadoop access token
   b. RPC server side validates the presented hadoop access token and continues 
to serve request
   c. Successfully invoke a hadoop service RPC API

REQUIRED COMPONENTS for CLI USECASES - (beyond those required for REST):
COMP-9. TokenAuth Method negotiation, etc
COMP-10. Client side implementation to leverage REST endpoint for acquiring 
hadoop access tokens given a hadoop id_token
COMP-11. Server side implementation to validate incoming hadoop access tokens

UI
Various Hadoop services have their own web UI consoles for administration and 
end user interactions. These consoles need to also benefit from the 
pluggability of authentication mechansims to be on par with the access control 
of the cluster REST and RPC APIs.
Web consoles are protected with an WebSSOAuthenticationHandler which will be 
configured for either authentication or federation.

USECASE UI-1 Authentication/LDAP:
For the authentication usecase:
1. User's browser requests access to a UI console page
2. WebSSOAuthenticationHandler intercepts the request and redirects the browser 
to an IdP web endpoint exposed by the AuthenticationServer passing the 
requested url as the redirect_url
3. IdP web endpoint presents the user with a FORM over https
   a. user provides username/password and submits the FORM
4. AuthenticationServer authenticates the user with provided credentials 
against the configured LDAP server and:
   a. leverages a servlet filter or other authentication mechanism for the 
endpoint and authenticates the user with a simple LDAP bind with username and 
password
   b. acquires a hadoop id_token and uses it to acquire the required hadoop 
access token which is added as a cookie
   c. redirects the browser to the original service UI resource via the 
provided redirect_url
5. WebSSOAuthenticationHandler for the original UI resource interrogates the 
incoming request again for an authcookie that contains an access token upon 
finding one:
   a. validates the incoming token
   b. returns the AuthenticationToken as per AuthenticationHandler contract
   c. AuthenticationFilter adds the hadoop auth cookie with the expected token
   d. serves requested resource for valid tokens
   e. subsequent requests are handled by the AuthenticationFilter recognition 
of the hadoop auth cookie

USECASE UI-2 Federation/SAML:
For the federation usecase:
1. User's browser requests access to a UI console page
2. WebSSOAuthenticationHandler intercepts the request and redirects the browser 
to an SP web endpoint exposed by the AuthenticationServer passing the requested 
url as the redirect_url. This endpoint:
   a. is dedicated to redirecting to the external IdP passing the required 
parameters which may include a redirect_url back to itself as well as encoding 
the original redirect_url so that it can determine it on the way back to the 
client
3. the IdP:
   a. challenges the user for credentials and authenticates the user
   b. creates appropriate token/cookie and redirects back to the 
AuthenticationServer endpoint
4. AuthenticationServer endpoint:
   a. extracts the expected token/cookie from the incoming request and 
validates it
   b. creates a hadoop id_token
   c. acquires a hadoop access token for the id_token
   d. creates appropriate cookie and redirects back to the original 
redirect_url - being the requested resource
5. WebSSOAuthenticationHandler for the original UI resource interrogates the 
incoming request again for an authcookie that contains an access token upon 
finding one:
   a. validates the incoming token
   b. returns the AuthenticationToken as per AuthenticationHandler contrac
   c. AuthenticationFilter adds the hadoop auth cookie with the expected token
   d. serves requested resource for valid tokens
   e. subsequent requests are handled by the AuthenticationFilter recognition 
of the hadoop auth cookie
REQUIRED COMPONENTS for UI USECASES:
COMP-12. WebSSOAuthenticationHandler
COMP-13. IdP Web Endpoint within AuthenticationServer for FORM based login
COMP-14. SP Web Endpoint within AuthenticationServer for 3rd party token 
federation

On Wed, Sep 4, 2013 at 3:06 PM, larry mccay 
<lmc...@apache.org<mailto:lmc...@apache.org>> wrote:
Hello Kai, Jerry and common-dev'ers -

I would like to try and get a game plan together for how we go about getting 
some of these larger security changes into branches that are manageable, 
reviewable and ultimately mergeable in a timely manner.

In order to even start this discussion, I think we need an inventory of the 
high level projects that are underway in parallel. We can then identify those 
that are at the point where patches can be used to seed a branch. This will 
give us some insight into how to break it into phases.

Off the top of my head, I can think of the following high level efforts:

1. Pluggable Authentication and Token based SSO
2. CryptoFS for volume level encryption
3. Hive Table/Column Level Encryption (admittedly this is Hive work but it will 
leverage common work done in Hadoop)
4. Authorization

Now, #1 and #2 above have related Jiras and a number of patches available and 
are therefore early contenders for branching.

#1 has a draft for an initial iteration that was discussed in another thread 
and I will attach a pdf version of the iteration-1 proposal to this mail.

I propose that we converge on an initial plan based on further discussion of 
the attached iteration and file a Jira to represent that iteration. We can then 
break down the larger patches on existing Jiras to fit into the constrained 
scope of the agreed upon iteration and attach them to subtasks of the iteration 
Jira.

We can then seed a Pluggable Authentication and Token based SSO branch with 
those related patches from H-9392, H-9534, H-9781.

Now, whether we introduce a whole central sso service in that branch is up for 
discussion but I personally think that it will violate the "keeping it small 
and manageable" goal. I am wondering whether a branch for security services 
would do well to decouple the consumers from a specific implementation that 
happens to be remote. Then within the Pluggable Authentication branch - we can 
concentrate on the consumer level and local implementations.

I assume that the CryptoFS work is also intended to be done within the branches 
and we have to therefore consider how to leverage common code for things like 
key access for encryption/decryption and signing/verifying. This sort of thing 
is being introduced by H-9534 as part of the Pluggable Authentication branch in 
support of JWT tokens. So, we will have to think through what branches are 
required for Crypto in the near term.

Perhaps, we can concentrate on those portions of crypto that will be of 
immediate benefit to iteration-1 and leave higher order CryptoFS stuff to 
another iteration? I don't think that we want an explosion of branches at any 
given time. If we can limit it to specific areas, close down on the iteration 
and get it merged before creating a new set of branches that would be best. 
Again, ease of review, test and merge is important for us.

I am curious how development across related branches like these would work 
though. If the service work needs to leverage work from the other how do we do 
that easily. Can we branch a branch? Will that require both to be ready to 
merge at the same time?

Perhaps, low-level dependencies can be duplicated for some time and then 
consolidated later?

Anyway, specific questions:

Does the proposal to start with the attached iteration-1 draft to create an 
iteration Jira make sense to everyone?

Does anyone have specific suggestions regarding the best way for managing 
branches that should be decoupled but at the same time leverage common code?

Any other thoughts or insight?

thanks,

--larry




Reply via email to