Hi devs, In HadoopOutputFormat.close(), I see code that is trying to rename <outputPath>/tmp-r-00001 to be <outputPath>/1
But when I run my Flink 1.9.2 code using a local MiniCluster, the actual location of the tmp-r-00001 file is: <outputPath>/_temporary/0/task__0000_r_000001/tmp-r-00001 I think this is because the default behavior of Hadoop’s FileOutputCommitter (with algorithm == 1) is to put files in task-specific sub-dirs. It’s depending on a post-completion “merge paths” action to be taken by what is (for Hadoop) the Application Master. I assume that when running on a real cluster, the HadoopOutputFormat.finalizeGlobal() method’s call to commitJob() would do this, but it doesn’t seem to be happening when I run locally. If I set the algorithm version to 2, then “merge paths” is handled by FileOutputCommitter immediately, and the HadoopOutputFormat code finds files in the expected location. Wondering if Flink should always be using version 2 of the algorithm, as that’s more performant when there are a lot of results (which is why it was added). Thanks, — Ken -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr