Hi,

I have a question on using Hive ACID on Hive 3.x against cloud blob stores,
would be much obliged if someone could answer the same.

As I understand it, the results of a compaction(major or minor) need to be
atomically visible, so that when there are uncompacted and compacted
directories present, the reader can pick the compacted ones.

To illustrate my upcoming question, please consider the following example.
Two delta directories exist:
delta_41_41
delta_40_40

After minor compaction:
delta_40_41
delta_41_41
delta_40_40

The reader will pick delta_40_41 as its range encompasses the rest, and
ignore delta_41_41 and delta_40_40.

However, for this to work correctly, the premise is that the compacted
directories should be visible atomically, ie it should not be the case that
some files in the compacted directory are visible but some are not. Now
this would work fine on HDFS as the rename of a directory is atomic. But on
cloud blob stores, as the rename is actually a copy and a delete, wouldn't
the compacted directory be visible even when only a subset(even just 1) of
the files have been copied, and wouldn't that lead to wrong results as the
reader would pick the incompletely copied compacted delta directory? Or
have I understood this incorrectly?

To me it looks like this problem will be solved by
https://issues.apache.org/jira/browse/HIVE-20823, but until then, is this
broken or I have missed a crucial detail?

PS: I found https://issues.apache.org/jira/browse/HIVE-20392... is this
trying to solve this exact problem?

Thanks,
Abhishek

Reply via email to