[ https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273282#comment-16273282 ]
Alan Gates commented on HIVE-17361: ----------------------------------- Put a couple of comments in review board. > Support LOAD DATA for transactional tables > ------------------------------------------ > > Key: HIVE-17361 > URL: https://issues.apache.org/jira/browse/HIVE-17361 > Project: Hive > Issue Type: New Feature > Components: Transactions > Reporter: Wei Zheng > Assignee: Eugene Koifman > Priority: Critical > Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, > HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, > HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, > HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, > HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, > HIVE-17361.23.patch, HIVE-17361.24.patch, HIVE-17361.25.patch, > HIVE-17361.3.patch, HIVE-17361.4.patch > > > LOAD DATA was not supported since ACID was introduced. Need to fill this gap > between ACID table and regular hive table. > Current Documentation is under [DML > Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations] > and [Loading files into > tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]: > \\ > * Load Data performs very limited validations of the data, in particular it > uses the input file name which may not be in 00000_0 which can break some > read logic. (Certainly will for Acid). > * It does not check the schema of the file. This may be a non issue for Acid > which requires ORC which is self describing so Schema Evolution may handle > this seamlessly. (Assuming Schema is not too different). > * It does check that _InputFormat_S are compatible. > * Bucketed (and thus sorted) tables don't support Load Data (but only if > hive.strict.checks.bucketing=true (default)). Will keep this restriction for > Acid. > * Load Data supports OVERWRITE clause > * What happens to file permissions/ownership: rename vs copy differences > \\ > The implementation will follow the same idea as in HIVE-14988 and use a > base_N/ dir for OVERWRITE clause. > \\ > How is minor compaction going to handle delta/base with original files? > Since delta_8_8/_meta_data is created before files are moved, delta_8_8 > becomes visible before it's populated. Is that an issue? > It's not since txn 8 is not committed. > h3. Implementation Notes/Limitations (patch 25) > * bucketed/sorted tables are not supported > * input files names must be of the form 00000_0/00000_0_copy_1 - enforced. > (HIVE-18125) > * Load Data creates a delta_x_x/ that contains new files > * Load Data w/Overwrite creates a base_x/ that contains new files > * A '_metadata_acid' file is placed in the target directory to indicate it > requires special handling on read > * The input files must be 'plain' ORC files, i.e. not contain acid metadata > columns as would be the case if these files were copied from another Acid > table. In the latter case, the ROW_IDs embedded in the data may not make > sense in the target table (if it's in a different cluster, for example). > Such files may also have a mix of committed and aborted data. > ** this could be relaxed later by adding info to the _metadata_acid file to > ignore existing ROW_IDs on read. > * ROW_IDs are attached dynamically at read time and made permanent by > compaction. This is done the same way has handling of files that were > written to a table before it was converted to Acid. > * Vectorization is supported -- This message was sent by Atlassian JIRA (v6.4.14#64029)