I am curious on the thoughts of the community here, this seems like something many enterprises would drool over with Hive... I am not a coder so the level coding involved something like this is unknown.
On Sat, May 4, 2013 at 8:31 AM, John Omernik <j...@omernik.com> wrote: > We were doing some tests this past week with hive authorization, one of > our current use "challenges" is when we have an underlying, well managed > and partitioned table, and we want to allow access to certain columns in > that table. Our first thoughts went to VIEWs as that's a common use case > with Relational Databases, (i.e. setup a view with only the columns you > want the user to access) and set the permissions appropriately. > > In testing, and this is not surprising given the the "newness" of Hive > Authorization, a VIEW can not be created as to allow access to to a table > without granting access to the underlying table, defeating the idea of the > view as tool to manage that access. > > So I wanted to put to the user group: I've done some JIRA searching and > didn't find anything (I will admit my JIRA search Foo is not stellar), but > is there an option that could be thrown together in Hive that would allow > that use case? Perhaps a configuration setting that would allow views to > execute as a specific user (perhaps a global user, or perhaps a user > specified as view creation). This could allow the "view" to have access to > underlying table, but since the view is created, and it couldn't be changed > by the user, and thus you could set view "read" permissions to your user or > group of users you want access. > > I suppose this has challenges "i.e. can a user just create a view to > bypass table level restrictions? Perhaps if this model was taken, the > privilege for CREATING/MODIFYING views could be created and granted only to > a superuser of some sort. I am really just walking through ideas here as > this is the one last stumbling blocks we have with Hive from an "Enterprise > ready" point of view. Heck, if done right, you could almost do data masking > at the view level. You have a column in your source data that is sensitive, > so instead of returning that column you do a MD5 (can we have a native MD5 > function? :) of that column or you blank that column. If we put in strong > security on the creation, modification of views, and allow views to execute > as a different user that has access to source data, you have a powerful way > to represent your data to all levels within your org. > > Also: Since I am just brain storming here, I'd love to hear what others > maybe doing around this area. Perhaps the Hive User Community can come up > with a strategic plan, while at the same time share some shorter term > workarounds. > > Thanks! >