[ 
https://issues.apache.org/jira/browse/HIVE-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-8719:
-----------------------------------
    Description: 
LOAD DATA INSERT INTO seems to be broken currently for partitions that do not 
use hive's native directory structure naming scheme, thus ignoring any location 
previously set by an ALTER TABLE ADD PARTITION ... LOCATION ... command.

Here is a simple reproducer:

{noformat}
echo 1 > /tmp/data1.txt
hive -e "create external table testpart(id int) partitioned by (date string) 
location '/tmp/testpart';"
hive -e "alter table testpart add partition(date='2014-09-16')  location 
'/tmp/testpart/20140916';"
hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep 
'/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/a
hive -e "load data local inpath '/tmp/data1.txt' into table testpart 
partition(date='2014-09-16');"
hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep 
'/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/b
diff /tmp/a /tmp/b
hadoop fs -ls /tmp/testpart/
{noformat}

Basically, what happens is that after the ALTER TABLE ADD PARTITION ... 
LOCATION, the location is "/tmp/testpart/20140916". After the LOAD DATA has 
run, the partition location becomes "/tmp/testpart/date=2014-09-16/". Any data 
previously present in the other location will then be ignored as well.

  was:
LOAD DATA INSERT INTO seems to be broken currently for partitions that do not 
use hive's native directory structure naming scheme, thus ignoring any location 
previously set by an ALTER TABLE ADD PARTITION ... LOCATION ... command.

Here is a simple reproducer:

{noformat}
echo 1 > /tmp/data1.txt
hive -e "create external table testpart(id int) partitioned by (date string) 
location '/tmp/testpart';"
hive -e "alter table testpart add partition(date='2014-09-16')  location 
'/tmp/testpart/20140916';"
hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep 
'/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/a
hive -e "load data local inpath '/tmp/data1.txt' into table testpart 
partition(date='2014-09-16');"
hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep 
'/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/b
diff /tmp/a /tmp/b
hadoop fs -ls /tmp/testpart/
{noformat}


> LoadSemanticAnalyzer ignores previous partition location if inserting into 
> partition that already exists
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8719
>                 URL: https://issues.apache.org/jira/browse/HIVE-8719
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>            Reporter: Sushanth Sowmyan
>         Attachments: HIVE-8719.patch
>
>
> LOAD DATA INSERT INTO seems to be broken currently for partitions that do not 
> use hive's native directory structure naming scheme, thus ignoring any 
> location previously set by an ALTER TABLE ADD PARTITION ... LOCATION ... 
> command.
> Here is a simple reproducer:
> {noformat}
> echo 1 > /tmp/data1.txt
> hive -e "create external table testpart(id int) partitioned by (date string) 
> location '/tmp/testpart';"
> hive -e "alter table testpart add partition(date='2014-09-16')  location 
> '/tmp/testpart/20140916';"
> hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep 
> '/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/a
> hive -e "load data local inpath '/tmp/data1.txt' into table testpart 
> partition(date='2014-09-16');"
> hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep 
> '/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/b
> diff /tmp/a /tmp/b
> hadoop fs -ls /tmp/testpart/
> {noformat}
> Basically, what happens is that after the ALTER TABLE ADD PARTITION ... 
> LOCATION, the location is "/tmp/testpart/20140916". After the LOAD DATA has 
> run, the partition location becomes "/tmp/testpart/date=2014-09-16/". Any 
> data previously present in the other location will then be ignored as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to