Is there a way to prevent LOAD DATA LOCAL INPATH from appending _copy_1 to logs that already exist in a partition? If the log is already in hdfs/hive I'd rather it fail and give me an return code or output saying that the log already exists.
For example, if I run these queries: /usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_a.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')" /usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')" /usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')" /usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')" I end up with: test_a.bz2 test_b.bz2 test_b_copy_1.bz2 test_b_copy_2.bz2 However, If I use OVERWRITE it will nuke all the data in the partition (including test_a.bz2) and I end up with just: test_b.bz2 I recall that older versions of hive would not do this. How do I handle this case? Is there a safe atomic way to do this? Sean