[jira] [Work logged] (HIVE-21564) Load data into a bucketed table is ignoring partitions specs and loads data into default partition.

ASF GitHub Bot (JIRA) Mon, 15 Apr 2019 19:30:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21564?focusedWorklogId=228082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-228082
 ]


ASF GitHub Bot logged work on HIVE-21564:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Apr/19 02:29
            Start Date: 16/Apr/19 02:29
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on pull request #597: HIVE-21564: 
Load data into a bucketed table is ignoring partitions specs and loads data 
into default partition.
URL: https://github.com/apache/hive/pull/597#discussion_r275609707
 
 

 ##########
 File path: 
ql/src/test/queries/clientpositive/load_static_ptn_into_bucketed_table.q
 ##########
 @@ -0,0 +1,33 @@
+set hive.stats.column.autogather=false;
+set hive.strict.checks.bucketing=true;
+
+set hive.explain.user=false;
+
+-- SORT_QUERY_RESULTS
+
+-- Single key partition
+-- Load with full partition spec
+CREATE TABLE src_bucket_tbl(key int, value string) partitioned by (ds string) 
clustered by (key) into 1 buckets STORED AS TEXTFILE;
+explain load data local inpath '../../data/files/bmj/000000_0' INTO TABLE 
src_bucket_tbl partition(ds='2008-04-08');
+load data local inpath '../../data/files/bmj/000000_0' INTO TABLE 
src_bucket_tbl partition(ds='2008-04-08');
+select * from src_bucket_tbl where ds='2008-04-08';
+
+drop table src_bucket_tbl;
+
+-- Multi key partition
+-- Load with both static and dynamic partition spec
+CREATE TABLE src_bucket_tbl(key int, value string) partitioned by (hr int, ds 
string) clustered by (key) into 1 buckets STORED AS TEXTFILE;
 
 Review comment:
   Updated test script as per your comment. Pls check.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 228082)
    Time Spent: 0.5h  (was: 20m)

> Load data into a bucketed table is ignoring partitions specs and loads data 
> into default partition.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21564
>                 URL: https://issues.apache.org/jira/browse/HIVE-21564
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21564.01.patch, HIVE-21564.02.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When running below command to load data into bucketed tables it is not 
> loading into specified partition instead loaded into default partition.
> LOAD DATA INPATH '/tmp/files/000000_0' OVERWRITE INTO TABLE call 
> PARTITION(year_partition=2012, month=12);
> SELECT * FROM call WHERE year_partition=2012 AND month=12; --> returns 0 rows.
> {code}
> CREATE TABLE call( 
> date_time_date date, 
> ssn string, 
> name string, 
> location string) 
> PARTITIONED BY ( 
> year_partition int, 
> month int) 
> CLUSTERED BY ( 
> date_time_date) 
> SORTED BY ( 
> date_time_date ASC) 
> INTO 1 BUCKETS 
> STORED AS ORC;
> {code}
> If set hive.exec.dynamic.partition to false, it fails with below error.
> {code}
> Error: Error while compiling statement: FAILED: SemanticException 1:18 
> Dynamic partition is disabled. Either enable it by setting 
> hive.exec.dynamic.partition=true or specify partition column values. Error 
> encountered near token 'month' (state=42000,code=40000)
> {code}
> When we "set hive.strict.checks.bucketing=false;", the load works fine.
> This is a behaviour imposed by HIVE-15148 to avoid incorrectly named data 
> files being loaded to the bucketed tables. In customer use case, if the files 
> are named properly with bucket_id (00000_0, 00000_1 etc), then it is safe to 
> set this flag to false.
> However, current behaviour of loading into default partitions when 
> hive.strict.checks.bucketing=true and partitions specified, was a bug 
> injected by HIVE-19311 where the given query is re-written into a insert 
> query (to handle incorrect file names and Orc versions) but missed to 
> incorporate the partitions specs to it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21564) Load data into a bucketed table is ignoring partitions specs and loads data into default partition.

Reply via email to