Hi all,
I have a question about Hive3 Managed Tables and how they should be used
in a production environment, lets say in an enterprise environment.

As far as I understand, managed tables has a helpful set of features.
See 
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/using-hiveql/content/hive_hive_3_tables.html
So I see many reasons to use managed tables instead external tables.

The hive documention says that the data of managed tables is completely
managed by Hive. That means the managed table space (hdfs path) is owned
by the user `hive`. And only the owner has `rwx` to this path. No one
else. So using `beeline` with another user than `hive` or even with
`hive` but with impersonation/proxy-user does not give me the access to the data
via select statement.

In an enterprise environment impersonation plays an important role. To
allow access to the data `ranger` (in HDP) comes into the game.
Is my assumption correct to use `ranger` to set ACL's to
allow a set of groups/users the access to the path of specific *managed* tables?

Second question...
If ranger opens the door to the data, i'm able to read the data directly
from the HDFS, lets say with a third party tool. But I believe this is
not a good option based on the fact how Hive is working with
transactional tables. See
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/using-hiveql/content/hive_3_internals.html
What I mean is the usage of deltas/buckets etc. Do you agree, direct
access to the HDFS files in the managed table space is not recommended?

Thanks,
Marko

Reply via email to