Hi all, I have a question about Hive3 Managed Tables and how they should be used in a production environment, lets say in an enterprise environment.
As far as I understand, managed tables has a helpful set of features. See https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/using-hiveql/content/hive_hive_3_tables.html So I see many reasons to use managed tables instead external tables. The hive documention says that the data of managed tables is completely managed by Hive. That means the managed table space (hdfs path) is owned by the user `hive`. And only the owner has `rwx` to this path. No one else. So using `beeline` with another user than `hive` or even with `hive` but with impersonation/proxy-user does not give me the access to the data via select statement. In an enterprise environment impersonation plays an important role. To allow access to the data `ranger` (in HDP) comes into the game. Is my assumption correct to use `ranger` to set ACL's to allow a set of groups/users the access to the path of specific *managed* tables? Second question... If ranger opens the door to the data, i'm able to read the data directly from the HDFS, lets say with a third party tool. But I believe this is not a good option based on the fact how Hive is working with transactional tables. See https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/using-hiveql/content/hive_3_internals.html What I mean is the usage of deltas/buckets etc. Do you agree, direct access to the HDFS files in the managed table space is not recommended? Thanks, Marko