Greetings, Hive Dev.

In the past few months, my colleagues and I have been trying to roll Hive 0.13 
out for wider use on Yahoo's Hadoop clusters. A "challenging" endeavour, shall 
we say.

Back when we were rolling out Hive 0.12, in spite of basing our builds on the 
Apache Hive 0.12 release branch, we ran into *several* problems that we 
wouldn't want to roll into production. (These have variously involved the ORC 
file-format, dynamic partitioning, metastore performance, query-plan 
serialization, and so on.) On the bright side, most of what we ran into was 
already found and rectified on trunk (now Hive 0.13). But those fixes didn't 
uniformly make it back to branch-0.12, then the current stable release. I fear 
we're now repeating this with Hive 0.13.

We've found that keeping up with the fixes on trunk is like trying to board a 
moving train while also trying to pull our shoes on. Patches often don't 
cleanly apply back to a release-branch because of unrelated changes on trunk. 
They sometimes depend silently on changes elsewhere. When we're lucky, tests 
fail. And when we're not, things go hilariously pear-shaped in production. 
Permit me the temerity of making the following suggestion:

1. For P1 bugs (i.e. involving data corruption, service unavailability, or 
serious failures without reasonable workarounds), along with a fix for trunk, I 
move that the current stable release branch also be patched. This will be much 
easier to accomplish alongside the trunk fix, than months down the line.
2. Of *course*, this doesn't apply to new features on trunk.

I realize this will involve a greater commitment from us (the community), 
certainly more effort for testing. But it'll ensure that current stable Hive 
release is both current and stable. =]

Thoughts?

Mithun

Reply via email to