[ https://issues.apache.org/jira/browse/HIVE-18098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-18098: ---------------------------------- Description: How should this work? For regular tables export just copies the files under table root to a specified directory. This doesn't make sense for Acid tables: * Some data may belong to aborted transactons * Transaction IDs are imbedded into data/files names. You'd have export delta/ and base/ each of which may have files with the same names, e.g. bucket_00000. * On import these IDs won't make sense in a different cluster or even a different table (which may have delta_x_x for example for the same x (but different data of course). * Export creates a _metadata column types, storage format, etc. Perhaps it can include info about aborted IDs (if the whole file can't be skipped). * Even importing into the same table on the same cluster may be a problem. For example delta_5_5/ existed at the time of export and was included in the export. But 2 days later it may not exist because it was compacted and cleaned. * If importing back into the same table on the same cluster, the data could be imported into a different transaction (assuming per table writeIDs) w/o having to remap the IDs in the rows themselves. * support Import Overwrite? * Support Import as a new txn with remapping of ROW_IDs? The new writeID can be stored in a delta_x_x/_meta_data and ROW__IDs can be remapped at read time (like isOriginal) and made permanent by compaction. * It doesn't seem reasonable to import acid data into non-acid table * Perhaps import can work similar to Load Data: look at the file imported, if it has Acid columns, leave a note in the delta_x_x/_meta_data to indicate that these columns should be skipped a new ROW_IDs assigned at read time. was: How should this work? For regular tables export just copies the files under table root to a specified directory. This doesn't make sense for Acid tables: * Some data may belong to aborted transactons * Transaction IDs are imbedded into data/files names. You'd have export delta/ and base/ each of which may have files with the same names, e.g. bucket_00000. * On import these IDs won't make sense in a different cluster or even a different table (which may have delta_x_x for example for the same x (but different data of course). * Export creates a _metadata column types, storage format, etc. Perhaps it can include info about aborted IDs (if the whole file can't be skipped). * Even importing into the same table on the same cluster may be a problem. For example delta_5_5/ existed at the time of export and was included in the export. But 2 days later it may not exist because it was compacted and cleaned. * If importing back into the same table on the same cluster, the data could be imported into a different transaction (assuming per table writeIDs) w/o having to remap the IDs in the rows themselves. * support Import Overwrite? * Support Import as a new txn with remapping of ROW_IDs? The new writeID can be stored in a delta_x_x/_meta_data and ROW__IDs can be remapped at read time (like isOriginal) and made permanent by compaction. * It doesn't seem reasonable to import acid data into non-acid table > Add support for Export/Import for Acid tables > --------------------------------------------- > > Key: HIVE-18098 > URL: https://issues.apache.org/jira/browse/HIVE-18098 > Project: Hive > Issue Type: New Feature > Components: Transactions > Reporter: Eugene Koifman > Assignee: Eugene Koifman > > How should this work? > For regular tables export just copies the files under table root to a > specified directory. > This doesn't make sense for Acid tables: > * Some data may belong to aborted transactons > * Transaction IDs are imbedded into data/files names. You'd have export > delta/ and base/ each of which may have files with the same names, e.g. > bucket_00000. > * On import these IDs won't make sense in a different cluster or even a > different table (which may have delta_x_x for example for the same x (but > different data of course). > * Export creates a _metadata column types, storage format, etc. Perhaps it > can include info about aborted IDs (if the whole file can't be skipped). > * Even importing into the same table on the same cluster may be a problem. > For example delta_5_5/ existed at the time of export and was included in the > export. But 2 days later it may not exist because it was compacted and > cleaned. > * If importing back into the same table on the same cluster, the data could > be imported into a different transaction (assuming per table writeIDs) w/o > having to remap the IDs in the rows themselves. > * support Import Overwrite? > * Support Import as a new txn with remapping of ROW_IDs? The new writeID can > be stored in a delta_x_x/_meta_data and ROW__IDs can be remapped at read time > (like isOriginal) and made permanent by compaction. > * It doesn't seem reasonable to import acid data into non-acid table > * Perhaps import can work similar to Load Data: look at the file imported, if > it has Acid columns, leave a note in the delta_x_x/_meta_data to indicate > that these columns should be skipped a new ROW_IDs assigned at read time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)