[ https://issues.apache.org/jira/browse/HIVE-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197791#comment-14197791 ]
Sushanth Sowmyan commented on HIVE-8719: ---------------------------------------- If it matters, this can be tested using the HCat E2E test framework, with the following test conf addition {noformat} { + 'name' => 'LoadData', + 'tests' => [ { + 'num' => 1, + 'sql' => "drop table if exists load_8719; + dfs -mkdir /:HDFSTMP:/load_8719; + create external table load_8719 (name string) partitioned by (date string) location '/:HDFSTMP:/load_8719'; + insert overwrite local directory ':OUTPATH:' select name from studenttab10k; + alter table load_8719 add partition(date='2014-09-16') location ':HDFSTMP:/load_8719/20140916'; + load data local inpath ':OUTPATH:' into table load_8719 partition(date='2014-09-16'); + describe formatted load_8719 partition(date='2014-09-16');", + 'cmd_fail' => 1, # Isn't really expected to fail, but we want to prevent benchmark generation because we aren't interested in it. + 'expected_out_regex' => "Location:.*/load_8719/20140916", + } {noformat} I will take on a task on another bug to try and extend the current .q test framework to be able to test this, and file a jira for that. > LoadSemanticAnalyzer ignores previous partition location if inserting into > partition that already exists > -------------------------------------------------------------------------------------------------------- > > Key: HIVE-8719 > URL: https://issues.apache.org/jira/browse/HIVE-8719 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.14.0 > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > Attachments: HIVE-8719.patch > > > LOAD DATA INSERT INTO seems to be broken currently for partitions that do not > use hive's native directory structure naming scheme, thus ignoring any > location previously set by an ALTER TABLE ADD PARTITION ... LOCATION ... > command. > Here is a simple reproducer: > {noformat} > echo 1 > /tmp/data1.txt > hive -e "create external table testpart(id int) partitioned by (date string) > location '/tmp/testpart';" > hive -e "alter table testpart add partition(date='2014-09-16') location > '/tmp/testpart/20140916';" > hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep > '/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/a > hive -e "load data local inpath '/tmp/data1.txt' into table testpart > partition(date='2014-09-16');" > hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep > '/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/b > diff /tmp/a /tmp/b > hadoop fs -ls /tmp/testpart/ > {noformat} > Basically, what happens is that after the ALTER TABLE ADD PARTITION ... > LOCATION, the location is "/tmp/testpart/20140916". After the LOAD DATA has > run, the partition location becomes "/tmp/testpart/date=2014-09-16/". Any > data previously present in the other location will then be ignored as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)