Is there a way to prevent LOAD DATA LOCAL INPATH from appending _copy_1 to logs 
that already exist in a partition?  If the log is already in hdfs/hive I'd 
rather it fail and give me an return code or output saying that the log already 
exists.

For example, if I run these queries:
/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_a.bz2' INTO TABLE 
logs PARTITION(ds='2012-03-19', hr='23')"
/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE 
logs PARTITION(ds='2012-03-19', hr='23')"
/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE 
logs PARTITION(ds='2012-03-19', hr='23')"
/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE 
logs PARTITION(ds='2012-03-19', hr='23')"

I end up with:
test_a.bz2
test_b.bz2
test_b_copy_1.bz2
test_b_copy_2.bz2

However, If I use OVERWRITE it will nuke all the data in the partition 
(including test_a.bz2) and I end up with just:
test_b.bz2

I recall that older versions of hive would not do this.  How do I handle this 
case?  Is there a safe atomic way to do this?

Sean







Reply via email to