[ https://issues.apache.org/jira/browse/HIVE-21757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853551#comment-16853551 ]
Thejas M Nair commented on HIVE-21757: -------------------------------------- For Hive replication, we want to be able to run compaction independently on both source and target, rather than copying compacted files. Replicating/copying compacted files would result increase the cross data center bandwidth usage by few times. For purpose of invalidating the file list cache, a simple table level marker like what you suggestion should suffice. Maybe a property like "lastCleanupId=<cleanup id>" would be sufficient. <cleanup id> be and integer sequence or global txn id itself. > ACID: use a new write id for compaction's output instead of the visibility id > ----------------------------------------------------------------------------- > > Key: HIVE-21757 > URL: https://issues.apache.org/jira/browse/HIVE-21757 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 4.0.0 > Reporter: Vaibhav Gumashta > Priority: Major > > HIVE-20823 added support for running compaction within a transaction. To > control the visibility of the output directory, it uses > base_writeId_visibilityId, where visibilityId is the transaction id of the > transaction that the compactor ran in. Perhaps we can keep using the > base_writeId format, by allocating a new writeId for the compactor and > creating the new base/delta with that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)