Hi Marco,

You can check this out:
https://github.com/awslabs/amazon-emr-user-role-mapper/tree/master/emr-user-role-mapper-s3storagebasedauthorizationmanager
It is open sourced with AWS EMR utils named URM and we have been using it
for two years now.

Thanks

Wen

On Tue, Jan 17, 2023 at 1:12 AM Marco Jacopo Ferrarotti <
marco.ferraro...@gmail.com> wrote:

> Hi,
>
> I'm building an on prem data warehouse with a custom s3 gateway as storage
> backend. I was able to deploy a standalone Hive Metastore Server (HMS)
> secured by kerberos however now I'm having a hard time figuring out how to
> manage authorization.
>
> It seems to me that the storage based authorization layer is not
> compatible with s3a since hadoop reports just stub permissions for such
> "fs". On the other hand  SQL Standards Based Authorization would force me
> to restrict everyone to access the data by means of hiveserver2 and this is
> not a viable solution for my use case. At least I would like to have a
> two-way access to the data/metadata:
>
> 1. using pySpark (mainly to develop ETL/ELT pipelines);
> 2. using a JDBC/ODBC connector (mainly to feed BI dashboards), for this I
> was considering the spark-thrift server but I'm open to hive2 as well;
>
> Am I missing something? Right now the only option I see would be to write
> a custom MetastoreAuthorizationProvider that checks s3a permissions either
> by querying the bucket ACLs or by performing test read/write/delete actions
> on the bucket. Has anyone tried to implement something similar?
>
> Thanks,
> Marco
>

Reply via email to