>Is there anyway to avoid creating sub-directories while running in tez? >Or this is by design and can not be changed?
Yes, this is by design. The Tez execution of UNION is entirely parallel & the task-ids overlaps - so the files created have to have unique names. But the total counts for "Map 1" and "Map 2" are only available as the job runs, so they write to different dirs. Here's a comparison of MapReduce vs Tez (from 2014, some slides are out of date now). http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/15 This UNION method is faster because of fewer intermediate HDFS writes & mapreduce.input.fileinputformat.input.dir.recursive=true kicks in as long as your cluster runs YARN (which it does, because otherwise Tez wouldn't work). Cheers, Gopal