[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

Carl Steinbach (JIRA) Mon, 04 Jun 2012 18:03:29 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289040#comment-13289040
 ]


Carl Steinbach commented on HIVE-2989:
--------------------------------------

# I think it would be a good idea to require the user to name the table link 
(using a valid SQL identifier) instead of tying the name of the link to the 
table/db it points to. Table links (like views) can provide a useful level of 
indirection, but we lose that if the name of link has to map directly to the 
target table. For example, suppose table t1 exists in the default db, and the 
table link for t1 is created in database db1. Suppose at some later point t1 is 
moved to a different db, or that the name of t1 is changed. With the current 
implementation we would then also have to change the name of the link, which 
would require us to also change any applications or scripts that refer to this 
link. With the other approach we would only need to ALTER the details of the 
link but would be able leave the name of link unchanged.
# Given a table t, is there a way to print out the list of table links that 
point to t?
# What is the expected behavior if I create a link to target@default and then 
drop target?
## There are a bunch of variations on this same question, e.g. what happens if 
I change the name of the target table, or if I add columns to the target table, 
etc. There should be test coverage for each one of these cases.
# What is the expected behavior if a user creates a link to target@default, and 
then target is subsequently dropped and then created again by another user? 
Should the link continue to work? If so, doesn't this create a potential 
security vulnerability?
# It looks likes there's currently no difference in the way DYNAMIC and STATIC 
table links behave. If that's true then I think we should remove this from the 
grammar and metadata in order to avoid confusion.
# DESCRIBE FORMATTED needs to be updated to print the linkTarget and linkTables 
fields.

                
> Adding Table Links to Hive
> --------------------------
>
>                 Key: HIVE-2989
>                 URL: https://issues.apache.org/jira/browse/HIVE-2989
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore, Query Processor, Security
>    Affects Versions: 0.10.0
>            Reporter: Bhushan Mandhani
>            Assignee: Bhushan Mandhani
>         Attachments: HIVE-2989.1.patch.txt, HIVE-2989.2.patch.txt, 
> HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, HIVE-2989.5.patch.txt, 
> HIVE-2989.6.patch.txt
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This will add Table Links to Hive. This will be an alternate mechanism for a 
> user to access tables and data in a database that is different from the one 
> he is associated with. This feature can be used to provide access control (if 
> access to databasename.tablename in queries and "use database X" is turned 
> off in conjunction).
> If db X wants to access one or more partitions from table T in db Y, the user 
> will issue:
> CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
> New partitions added to T will automatically be added to the link as well and 
> become available to X. However, if the link is specified to be static, that 
> will not be the case. The X user will then have to explicitly import each 
> partition of T that he needs. The command above will not actually make any 
> existing partitions of T available to X. Instead, we provide the following 
> command to add an existing partition to a link:
> ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
> The user will need to execute the above for each existing partition that 
> needs to be imported. For future partitions, Hive will take care of this. An 
> imported partition can be dropped from a link using a similar command. We 
> just specify "DROP" instead of "ADD". For querying the linked table, the X 
> user will refer to it as T@Y. Link Tables will only have read access and not 
> be writable. The entire Table Link alongwith all its imported partitions can 
> be dropped as follows:
> DROP LINK TO T@Y
> The above commands are purely MetaStore operations. The implementation will 
> rely on replicating the entire partition metadata when a partition is added 
> to a link.  For every link that is created, we will add a new row to table 
> TBLS. The TBL_TYPE column will have a new kind of value "LINK_TABLE" (or 
> "STATIC_LINK_TABLE" if the link has been specified as static). A new column 
> LINK_TBL_ID will be added which will contain the id of the imported table. It 
> will be NULL for all other table types including the regular managed tables. 
> When a partition is added to a link, the new row in the table PARTITIONS will 
> point to the LINK_TABLE in the same database  and not the master table in the 
> other database. We will replicate all the metadata for this partition from 
> the master database. The advantage of this approach is that fewer changes 
> will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
> like "SHOW TABLES" and "SHOW PARTITIONS" will work as expected for 
> LINK_TABLEs too. Of course, even though the metadata is not shared, the 
> underlying data on disk is still shared. Hive still needs to know that when 
> dropping a partition which belongs to a LINK_TABLE, it should not drop the 
> underlying data from HDFS. Views and external tables cannot be imported from 
> one database to another.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

Reply via email to