[ https://issues.apache.org/jira/browse/HIVE-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sushanth Sowmyan reassigned HIVE-14841: --------------------------------------- Assignee: Sushanth Sowmyan > Replication - Phase 2 > --------------------- > > Key: HIVE-14841 > URL: https://issues.apache.org/jira/browse/HIVE-14841 > Project: Hive > Issue Type: New Feature > Components: repl > Affects Versions: 2.1.0 > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > > Per email sent out to the dev list, the current implementation of replication > in hive has certain drawbacks, for instance : > * Replication follows a rubberbanding pattern, wherein different tables/ptns > can be in a different/mixed state on the destination, so that unless all > events are caught up on, we do not have an equivalent warehouse. Thus, this > only satisfies DR cases, not load balancing usecases, and the secondary > warehouse is really only seen as a backup, rather than as a live warehouse > that trails the primary. > * The base implementation is a naive implementation, and has several > performance problems, including a large amount of duplication of data for > subsequent events, as mentioned in HIVE-13348, having to copy out entire > partitions/tables when just a delta of files might be sufficient/etc. Also, > using EXPORT/IMPORT allows us a simple implementation, but at the cost of > tons of temporary space, much of which is not actually applied at the > destination. > Thus, to track this, we now create a new branch (repl2) and a uber-jira(this > one) to track experimental development towards improvement of this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)