Hi all, I wanted to understand what authorization model is most suitable for a production environment where most of the data is shared between multiple teams and users. I know this is would depend more on the use case but I cant seem to figure out the best model for our use:
We have data that is owned by a certain process (R/W access for that user) while other users only have Read access to that data. We have a lot of instances when users would want to create external tables pointing to this data. We tried the following 3 auth models: 1. *Default Authorization model*: This we think is less secure and any user can grant himself access to create/modify tables and databases even where they are not supposed to. We would want to have much tighter security than this model provides. 2. *Storage Based Authorization*: While this helps us by preventing users from modifying metadata by checking the HDFS permissions of the underlying directories, it prevents our most important use case of letting users create *external *tables on data they dont have write access to. I would assume external tables wont actually delete the data when dropping tables/partitions so this operation should be allowed. But because it is not, even this authorization model does not meet our use case. 3. *Sql Standard Based Authorization: *This does give us fine-grained control over which users can perform specific commands, but when it comes to creating external tables, even this authorization scheme seems to use the filesystem's permissions. So overall all 3 models didnt seem to fulfill our requirement here which I think would be a fairly common one. I want to know how other users manage security on Hive or If i am missing something. Thanks in advance, Udit