We’re operating long lived Hive metastore instances in AWS to provide a
metadata source of truth for our data processing pipelines. These pipelines
are not restricted to Hive SQL, but must use frameworks that can integrate
with the metastore (such as Spark). We’re storing data in S3. As these are
centralized shared resources, access control is a concern. Primarily this
to guard against user error rather than malicious intent (e.g. user
accidentally drops the wrong table). We’ve been experimenting with Kerberos
in EMR for strong authentication but have no definitive solution for
authorization as yet. We cannot use SQL based authorization as I believe
this is implemented at HiveServer2, which is not a useful integration point
for us. Additionally, there is no workable implementation for metastore
storage based authorization on S3. We’ve toyed with a
HiveMetastoreAuthorizationProvider that evaluates S3 bucket policies, but
this is fairly complex, especially when mapping principles to IAM entities.

More recently we came across DefaultHiveMetastoreAuthorizationProvider in
our travels which appears to implement legacy GRANT/REVOKE type controls in
both the Hive client and metastore layers. I’ve managed to get this working
solely in the metastore, and control access to metadata entities both via
the Hive CLI and the Thrift API. For example, I can craft some Thrift calls
to drop a table that correctly fail with one UGI principal, but succeed
with another. My metastore service configuration is like so, note that no
client side authorization plugin is configured:

hive.security.authorization.enabled=true
hive.security.authorization.createtable.owner.grants=ALL
hive.metastore.pre.event.listeners=org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
hive.security.metastore.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider


Aside from the documented shortcomings of this legacy authorization
implementation, can anyone suggest any additional pitfalls with this
configuration?

Thanks,

Elliot.

Reply via email to