[jira] [Commented] (HIVE-8719) LoadSemanticAnalyzer ignores previous partition location if inserting into partition that already exists

Sushanth Sowmyan (JIRA) Tue, 04 Nov 2014 23:35:09 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197791#comment-14197791
 ]


Sushanth Sowmyan commented on HIVE-8719:
----------------------------------------

If it matters, this can be tested using the HCat E2E test framework, with the 
following test conf addition

{noformat}
{
+      'name' => 'LoadData',
+      'tests' => [ {
+          'num' => 1,
+          'sql' => "drop table if exists load_8719;
+                    dfs -mkdir /:HDFSTMP:/load_8719;
+                    create external table load_8719 (name string)  partitioned 
by (date string) location '/:HDFSTMP:/load_8719';
+                    insert overwrite local directory ':OUTPATH:' select name 
from studenttab10k;
+                    alter table load_8719 add partition(date='2014-09-16') 
location ':HDFSTMP:/load_8719/20140916';
+                    load data local inpath ':OUTPATH:' into table load_8719 
partition(date='2014-09-16');
+                    describe formatted load_8719 
partition(date='2014-09-16');",
+          'cmd_fail' => 1, # Isn't really expected to fail, but we want to 
prevent benchmark generation because we aren't interested in it.
+          'expected_out_regex' => "Location:.*/load_8719/20140916",
+      }
{noformat}

I will take on a task on another bug to try and extend the current .q test 
framework to be able to test this, and file a jira for that.

> LoadSemanticAnalyzer ignores previous partition location if inserting into 
> partition that already exists
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8719
>                 URL: https://issues.apache.org/jira/browse/HIVE-8719
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-8719.patch
>
>
> LOAD DATA INSERT INTO seems to be broken currently for partitions that do not 
> use hive's native directory structure naming scheme, thus ignoring any 
> location previously set by an ALTER TABLE ADD PARTITION ... LOCATION ... 
> command.
> Here is a simple reproducer:
> {noformat}
> echo 1 > /tmp/data1.txt
> hive -e "create external table testpart(id int) partitioned by (date string) 
> location '/tmp/testpart';"
> hive -e "alter table testpart add partition(date='2014-09-16')  location 
> '/tmp/testpart/20140916';"
> hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep 
> '/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/a
> hive -e "load data local inpath '/tmp/data1.txt' into table testpart 
> partition(date='2014-09-16');"
> hive -e "describe formatted testpart partition(date='2014-09-16') ;" | egrep 
> '/tmp/testpart/(date=.?)?2014-?09-?16' > /tmp/b
> diff /tmp/a /tmp/b
> hadoop fs -ls /tmp/testpart/
> {noformat}
> Basically, what happens is that after the ALTER TABLE ADD PARTITION ... 
> LOCATION, the location is "/tmp/testpart/20140916". After the LOAD DATA has 
> run, the partition location becomes "/tmp/testpart/date=2014-09-16/". Any 
> data previously present in the other location will then be ignored as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8719) LoadSemanticAnalyzer ignores previous partition location if inserting into partition that already exists

Reply via email to