We’re operating long lived Hive metastore instances in AWS to provide a metadata source of truth for our data processing pipelines. These pipelines are not restricted to Hive SQL, but must use frameworks that can integrate with the metastore (such as Spark). We’re storing data in S3. As these are centralized shared resources, access control is a concern. Primarily this to guard against user error rather than malicious intent (e.g. user accidentally drops the wrong table). We’ve been experimenting with Kerberos in EMR for strong authentication but have no definitive solution for authorization as yet. We cannot use SQL based authorization as I believe this is implemented at HiveServer2, which is not a useful integration point for us. Additionally, there is no workable implementation for metastore storage based authorization on S3. We’ve toyed with a HiveMetastoreAuthorizationProvider that evaluates S3 bucket policies, but this is fairly complex, especially when mapping principles to IAM entities.
More recently we came across DefaultHiveMetastoreAuthorizationProvider in our travels which appears to implement legacy GRANT/REVOKE type controls in both the Hive client and metastore layers. I’ve managed to get this working solely in the metastore, and control access to metadata entities both via the Hive CLI and the Thrift API. For example, I can craft some Thrift calls to drop a table that correctly fail with one UGI principal, but succeed with another. My metastore service configuration is like so, note that no client side authorization plugin is configured: hive.security.authorization.enabled=true hive.security.authorization.createtable.owner.grants=ALL hive.metastore.pre.event.listeners=org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener hive.security.metastore.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider Aside from the documented shortcomings of this legacy authorization implementation, can anyone suggest any additional pitfalls with this configuration? Thanks, Elliot.