[ https://issues.apache.org/jira/browse/HIVE-24163?focusedWorklogId=487020&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487020 ]
ASF GitHub Bot logged work on HIVE-24163: ----------------------------------------- Author: ASF GitHub Bot Created on: 21/Sep/20 15:58 Start Date: 21/Sep/20 15:58 Worklog Time Spent: 10m Work Description: kuczoram commented on pull request #1507: URL: https://github.com/apache/hive/pull/1507#issuecomment-696209500 The file listing in the Utilities.getFullDPSpecs method was not correct for MM tables and for ACID tables when direct insert was on. This method returned all partitions from these tables, not just the ones affected by the current query. Because of this, the lineage information for inserting with dynamic partitioning into tables like these was not correct. Compared the lineage information with when inserting into external tables and for external tables only the partitions are present which are affected by the query. This is because for external tables when inserting into the table, the data first get written into the staging dir and when listing the partitions, this directory is checked and it contains only the newly inserted data. But for MM tables and ACID direct insert, the staging dir is missing, so it will check the table directory and lists everything from it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 487020) Time Spent: 20m (was: 10m) > Dynamic Partitioning Insert fail for MM table fail during MoveTask > ------------------------------------------------------------------ > > Key: HIVE-24163 > URL: https://issues.apache.org/jira/browse/HIVE-24163 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Rajkumar Singh > Assignee: Marta Kuczora > Priority: Major > Labels: pull-request-available > Fix For: 3.1.2 > > Time Spent: 20m > Remaining Estimate: 0h > > -- DDLs and Query > {code:java} > create table `class` (name varchar(8), sex varchar(1), age double precision, > height double precision, weight double precision); > insert into table class values ('RAJ','MALE',28,12,12); > CREATE TABLE `PART1` (`id` DOUBLE,`N` DOUBLE,`Name` VARCHAR(8),`Sex` > VARCHAR(1)) PARTITIONED BY(Weight string, Age > string, Height string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' > LINES TERMINATED BY '\012' STORED AS TEXTFILE; > INSERT INTO TABLE `part1` PARTITION (`Weight`,`Age`,`Height`) SELECT 0, 0, > `Name`,`Sex`,`Weight`,`Age`,`Height` FROM `class`; > {code} > it fail during the MoveTask execution: > {code:java} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: partition > hdfs://hostname:8020/warehouse/tablespace/managed/hive/part1/.hive-staging_hive_2020-09-02_13-29-58_765_4475282758764123921-1/-ext-10000/tmpstats-0_FS_3 > is not a directory! > at > org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2769) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at > org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2837) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at > org.apache.hadoop.hive.ql.exec.MoveTask.handleDynParts(MoveTask.java:562) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:440) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > ~[hive-service-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237] > {code} > The reason is Task write the fsstat during the FileSinkOperator closing, HS2 > ran the MoveTask to move data into the destination partition directory, while > getting the partition location hive check whether destination is directory or > not and failing. > -- hive set the stat location during > https://github.com/apache/hive/blob/d700ea54ec5da5364d92a9faaa58f89ea03181e0/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L8135 > which is relative to the hive-staging directory: > https://github.com/apache/hive/blob/fecad5b0f72c535ed1c53f2cc62b0d6649b651ae/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L617 -- This message was sent by Atlassian Jira (v8.3.4#803005)