[ https://issues.apache.org/jira/browse/HIVE-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan updated HIVE-2889: ----------------------------------- Fix Version/s: (was: 0.9.0) Unlinking from 0.9 > LOAD DATA IF NOT EXISTS functionality > ------------------------------------- > > Key: HIVE-2889 > URL: https://issues.apache.org/jira/browse/HIVE-2889 > Project: Hive > Issue Type: Improvement > Components: Import/Export > Affects Versions: 0.8.1 > Reporter: Sean McNamara > > *Background:* > The behavior of LOAD DATA LOCAL INPATH has changed. It used to give you an > error when trying to copy in a log that already existed. Now it re-names the > file with copy_1 so the file always goes into hdfs. > *Original discussion:* > http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCB8D2849.14F69%25sean.mcnamara%40webtrends.com%3E > *Issue:* > There is no longer an atomic way to insert files into hive and guarantee that > the file won't go in twice. Using OVERWRITE will cause other logs in the > table/partition to be deleted. > *Example:* > {{/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_a.bz2' INTO TABLE > logs PARTITION(ds='2012-03-19', hr='23')"}} > {{/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE > logs PARTITION(ds='2012-03-19', hr='23')"}} > {{/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE > logs PARTITION(ds='2012-03-19', hr='23')"}} > {{/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE > logs PARTITION(ds='2012-03-19', hr='23')"}} > *Result:* > {{test_a.bz2}} > {{test_b.bz2}} > {{test_b_copy_1.bz2}} > {{test_b_copy_2.bz2}} > _test_b data was inserted 3 times, which is not the desired behavior in this > instance._ > *Proposal:* > Add _IF NOT EXISTS_ flag to indicate copy semantics. If the the log file > does not exist in the table/partition, the log would go in normally. If the > log does exist in the table/partition hive would return an error and return > an exit code. > *Proposed HiveQL Example:* > {{LOAD DATA LOCAL IF NOT EXISTS INPATH 'test_a.bz2' INTO TABLE logs > PARTITION(ds='2012-03-19', hr='23')}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira