With no objections received, I have created a new branch called repl2, and have created a new umbrella jira ( HIVE-14841 ) and a jira component (repl) to track continued development.
Thanks, -Sushanth On Thu, Sep 22, 2016 at 10:03 AM, Sushanth Sowmyan <khorg...@gmail.com> wrote: > Hi Folks, > > We had some work done with replication back at HIVE-7973 and this > implemented a primary mode of replication for hive which can integrate > with tools like Falcon. I intend to move forward on continuing to > improve this, to fix some of the major problems with the current > implementation, mostly the following: > > a) Replication follows a rubberbanding pattern, wherein different > tables/ptns can be in a different/mixed state on the destination, so > that unless all events are caught up on, we do not have an equivalent > warehouse. Thus, this only satisfies DR cases, not load balancing > usecases, and the secondary warehouse is really only seen as a backup, > rather than as a live warehouse that trails the primary. > b) The base implementation is a naive implementation, and has several > performance problems, including a large amount of duplication of data > for subsequent events, as mentioned in HIVE-13348, having to copy out > entire partitions/tables when just a delta of files might be > sufficient/etc. Also, using EXPORT/IMPORT allows us a simple > implementation, but at the cost of tons of temporary space, much of > which is not actually applied at the destination. > > To that end, I want to create a new branch, so that we can track > development on this end on public apache jira. The last time I worked > on this, having a private branch meant large uber patches as in > HIVE-10227, which I would like to avoid this time, and is also more > inkeeping with open-development. Also, developing in master itself is > not a good idea, since some of the ideas I'm trying out can be > experimental, and probably still a ways from maturity. > > So, unless anyone has any objection, I would like to create a new > branch off master, say "repl2" and create an uber jira to manage > individual components of the work. > > Thanks, > -Sushanth