[ https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-18814: ---------------------------------- Status: Patch Available (was: Open) > Support Add Partition For Acid tables > ------------------------------------- > > Key: HIVE-18814 > URL: https://issues.apache.org/jira/browse/HIVE-18814 > Project: Hive > Issue Type: New Feature > Components: Transactions > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Major > Attachments: HIVE-18814.01.patch > > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] > Add Partition command creates a {{Partition}} metadata object and sets the > location to the directory containing data files. > In current master (Hive 3.0), Add partition on an acid table doesn't fail and > at read time the data is decorated with row__id but the original transaction > is 0. I suspect in earlier Hive versions this will throw or return no data. > Since this new partition didn't have data before, assigning txnid:0 isn't > going to generate duplicate IDs but it could violate Snapshot Isolation in > multi stmt txns. Suppose txnid:7 runs {{select * from T}}. Then txnid:8 > adds a partition to T. Now if txnid:7 runs the same query again, it will see > the data in the new partition. > This can't be release like this since a delete on this data (added via Add > partition) will use row_ids with txnid:0 so a later upgrade that sees > un-compacted may generate row_ids with different txnid (assuming this is > fixed by then) > > One option is follow Load Data approach and create a new delta_x_x/ and > move/copy the data there. > > Another is to allocate a new writeid and save it in Partition metadata. This > could then be used to decorate data with ROW__IDs. This avoids move/copy but > retains data "outside" of the table tree which make it more likely that this > data will be modified in some way which can really break things if done after > and SQL update/delete on this data have happened. > > It performs no validations on add (except for partition spec) so any file > with any format can be added. It allows add to bucketed tables as well. > Seems like a very dangerous command. Maybe a better option is to block it > and advise using Load Data. Alternatively, make this do Add partition > metadata op followed by Load Data. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)