Hi Marco, You can check this out: https://github.com/awslabs/amazon-emr-user-role-mapper/tree/master/emr-user-role-mapper-s3storagebasedauthorizationmanager It is open sourced with AWS EMR utils named URM and we have been using it for two years now.
Thanks Wen On Tue, Jan 17, 2023 at 1:12 AM Marco Jacopo Ferrarotti < marco.ferraro...@gmail.com> wrote: > Hi, > > I'm building an on prem data warehouse with a custom s3 gateway as storage > backend. I was able to deploy a standalone Hive Metastore Server (HMS) > secured by kerberos however now I'm having a hard time figuring out how to > manage authorization. > > It seems to me that the storage based authorization layer is not > compatible with s3a since hadoop reports just stub permissions for such > "fs". On the other hand SQL Standards Based Authorization would force me > to restrict everyone to access the data by means of hiveserver2 and this is > not a viable solution for my use case. At least I would like to have a > two-way access to the data/metadata: > > 1. using pySpark (mainly to develop ETL/ELT pipelines); > 2. using a JDBC/ODBC connector (mainly to feed BI dashboards), for this I > was considering the spark-thrift server but I'm open to hive2 as well; > > Am I missing something? Right now the only option I see would be to write > a custom MetastoreAuthorizationProvider that checks s3a permissions either > by querying the bucket ACLs or by performing test read/write/delete actions > on the bucket. Has anyone tried to implement something similar? > > Thanks, > Marco >