We were doing some tests this past week with hive authorization, one of our
current use "challenges" is when we have an underlying, well managed and
partitioned table, and we want to allow access to certain columns in that
table.  Our first thoughts went to VIEWs as that's a common use case with
Relational Databases, (i.e. setup a view with only the columns you want the
user to access) and set the permissions appropriately.

In testing, and this is not surprising given the the "newness" of Hive
Authorization, a VIEW can not be created as to allow access to to a table
without granting access to the underlying table, defeating the idea of the
view as tool to manage that access.

So I wanted to put to the user group: I've done some JIRA searching and
didn't find anything (I will admit my JIRA search Foo is not stellar), but
is there an option that could be thrown together in Hive that would allow
that use case?  Perhaps a configuration setting that would allow views to
execute as a specific user (perhaps a global user, or perhaps a user
specified as view creation).  This could allow the "view" to have access to
underlying table, but since the view is created, and it couldn't be changed
by the user, and thus you could set view "read" permissions to your user or
group of users you want access.

I suppose this has challenges "i.e. can a user just create a view to bypass
table level restrictions? Perhaps if this model was taken, the privilege
for CREATING/MODIFYING views could be created and granted only to a
superuser of some sort.  I am really just walking through ideas here as
this is the one last stumbling blocks we have with Hive from an "Enterprise
ready" point of view. Heck, if done right, you could almost do data masking
at the view level. You have a column in your source data that is sensitive,
so instead of returning that column you do a MD5 (can we have a native MD5
function? :) of that column or you blank that column. If we put in strong
security on the creation, modification of views, and allow views to execute
as a different user that has access to source data, you have a powerful way
to represent your data to all levels within your org.

Also: Since I am just brain storming here, I'd love to hear what others
maybe doing around this area. Perhaps the Hive User Community can come up
with a strategic plan, while at the same time share some shorter term
workarounds.

Thanks!

Reply via email to