Hello, When Hive MapReduce jobs create HDFS output files, they use the format:
000000_0.gz 000000_0.gz_copy_1 000000_0.gz_copy_2 000000_0.gz_copy_3 ... This seems like it could become a long running list over time. In fact, the code says "leave the below loop for now until a better approach is found." https://github.com/apache/hive/blob/758ff449099065a84c46d63f9418201c8a6731b1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3710 Would it be problematic to simply prefix a random number, or timestamp, on the front of the file name to make it unique? This would save the code from having to loop to ask the FileSystem (NameNode) "is copy 1 there?", "is copy 2 there?", "is copy 1 there?" etc. Thanks.