HDFS-5126 <https://issues.apache.org/jira/browse/HDFS-5126> has been created for HDFS user impersonation, and I will develop a prototype in a few weeks
Thanks, Erik.fang On Tue, Aug 20, 2013 at 3:07 PM, Erik fang <fme...@gmail.com> wrote: > Hi folks, > > > HDFS has a POSIX-like permission model, using R,W,X and owner, group, > other for access control. It is good most of the time, except for: > > 1. Data need to be shared among users > > group can be used for access control, and the users has to be in the same > GROUP as the data. the GROUP here stand for the sharing relationship > between users and data. If many sharing relationships exists, there are > many groups. It is hard to manage. > > 2. Hive > > Hive use a table based access control model, user can have SELECT, > UPDATE, CREATE, DROP privileges on certain table, which means R/W > permission in HDFS. However, Hive’s table based authorization doesn’t match > HDFS’s POSIX-like model. For hive user accessing HDFS, Group permissions > can be deployed, which introduces many groups, or big groups contains many > sharing relationship. > > Inspired by RDBMS’s way of manage data, a directory level access control > based on authorized user impersonate can be implemented as a extension to > POSIX-like permission model. > > it consist of: > > 1. ACLFileSystem > > 2. authorization manager: hold access control information and a shared > secret with namenode > > 3. authenticator(embedded in namenode) > > Take hive as a example, owner of the data is user DW. The procedure is: > > 1. user submit a hive query or a hcatalog job to access DW’s data, we > can get the read table/partition and write table/partition, and the > corresponding hdfs path. Then a RPC call to authorization manager is > invoked, send > > {user, tablename, table_path, w/r} > > 2. authorization manager do a authorization check to find whether it is > allowed. If allowed, reply a encrypted tablepath: > > {realuser, encrypted(tablepath+w/r)} > > realuser here stand for the owner of the requested data > > 3. ACLFilesystem extends FileSystem and when a open(path) call is invoked > , it replace the path to encrypted(tablepath+w/r) and invoke the namenode > RPC call, such as > > open(realuser, encrypted(tablepath+w/r), null) > > If the user is requesting a partition path, the rpc call can be invoked as > > open(realuser, encrypted(tablepath+w/r), path_suffix) > > 4. Namenode pick up the RPC call, decrypt the encrypted(hdfspath+w/r) with > the shared secret to verify whether it is fake. If it is true, check w/r > operation, join the tablepath and path_suffix, and invoke the call as > hdfspath owner, for example user DW. > > > delegation token or something else can be used as the shared secret, and > authorization manager can be integrated into hive metastore. > > In general, I propose a HDFS user impersonate mechanism and a > authorization mechanism based on HDFS user impersonation. > > If the community is interested, I will file a jira for HDFS user > impersonation and a jira for authorization manager soon. > > > Thoughts? > > Thanks a lot > Erik.fang > >