Hello Hive users,

After attending the Hive meetup yesterday (huge thanks to the organizers!),
I thought that perhaps many organizations were maintaining their own Hive 2
and 3 branches by backporting important patches to vanilla Hive. Ideally it
would be great if all the important patches were regularly merged to Hive 2
and 3 branches (e.g., branch-2.3 and branch-3.1), but I guess this would
take a lot of time and effort on the Hive committer side, and it also seems
like at the moment, most of the efforts are directed at the master branch.

I find this process of backporting patches to Hive 2 and 3 branches to be
quite a challenge and time-consuming, especially to those "outsiders" who
have not implemented/reviewed the patches. The problem is two-fold: 1) you
have to decide what patches to apply and in what order; 2) you have to run
all the tests to make sure that new patches are compatible with the code
base and do not introduce new bugs.

1) is not easy because sometimes a patch from the master branch fails to
merge because of missing dependencies. In such a case, you have to go back
to the history of commits, identify those dependency commits, and merge
them first. Depending on the level of changes made in the patch, this can
be a big pain.

2) can be also a problem if applying a new patch produces different test
results. Sometimes a patch is merged with no conflicts, but some tests
fail. Besides it may take a lot of time to run tests themselves.

So, I wonder if anyone could share their experience and wisdom on how to
maintain Hive 2 and 3 branches, or share their git repos. For us, we have
applied about 210 patches to Hive 3.1.3 (since Nov 2, 2020), and are in the
middle of applying additional 100+ patches. You can find our work at the
following repo. (You can ignore the last commit which is internal to our
work.)

https://github.com/mr3project/hive-mr3/commits/master3

Thanks,

--- Sungwoo Park

Reply via email to