Hi, I have a question on using Hive ACID on Hive 3.x against cloud blob stores, would be much obliged if someone could answer the same.
As I understand it, the results of a compaction(major or minor) need to be atomically visible, so that when there are uncompacted and compacted directories present, the reader can pick the compacted ones. To illustrate my upcoming question, please consider the following example. Two delta directories exist: delta_41_41 delta_40_40 After minor compaction: delta_40_41 delta_41_41 delta_40_40 The reader will pick delta_40_41 as its range encompasses the rest, and ignore delta_41_41 and delta_40_40. However, for this to work correctly, the premise is that the compacted directories should be visible atomically, ie it should not be the case that some files in the compacted directory are visible but some are not. Now this would work fine on HDFS as the rename of a directory is atomic. But on cloud blob stores, as the rename is actually a copy and a delete, wouldn't the compacted directory be visible even when only a subset(even just 1) of the files have been copied, and wouldn't that lead to wrong results as the reader would pick the incompletely copied compacted delta directory? Or have I understood this incorrectly? To me it looks like this problem will be solved by https://issues.apache.org/jira/browse/HIVE-20823, but until then, is this broken or I have missed a crucial detail? PS: I found https://issues.apache.org/jira/browse/HIVE-20392... is this trying to solve this exact problem? Thanks, Abhishek