[ https://issues.apache.org/jira/browse/HIVE-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15724727#comment-15724727 ]
Hive QA commented on HIVE-15367: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12841872/HIVE-15367.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 10768 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[crtseltbl_serdeprops] (batchId=73) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[materialized_view_authorization_sqlstd] (batchId=43) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[materialized_view_create] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[materialized_view_describe] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[materialized_view_drop] (batchId=9) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_ctas] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=150) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[ctas_noemptyfolder] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_authorization_create_no_grant] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_authorization_create_no_select_perm] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_authorization_drop_other] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_authorization_no_select_perm] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_delete] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_drop2] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_drop] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_insert] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_load] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_replace_with_view] (batchId=84) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[materialized_view_update] (batchId=84) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join22] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join29] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby5_map] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby6] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_array] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_distinct] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[merge1] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_insert_move_tasks_share_dependencies] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[runtime_skewjoin_mapjoin_spark] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[sample5] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[stats0] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union8] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_mapjoin] (batchId=116) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2434/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2434/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2434/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 38 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12841872 - PreCommit-HIVE-Build > CTAS with LOCATION should write temp data under location directory rather > than database location > ------------------------------------------------------------------------------------------------ > > Key: HIVE-15367 > URL: https://issues.apache.org/jira/browse/HIVE-15367 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Attachments: HIVE-15367.1.patch > > > For regular CTAS queries, temp data from a SELECT query will be written to to > a staging directory under the database location. The code to control this is > in {{SemanticAnalyzer.java}} > {code} > // allocate a temporary output dir on the location of the table > String tableName = getUnescapedName((ASTNode) ast.getChild(0)); > String[] names = Utilities.getDbTableName(tableName); > Path location; > try { > Warehouse wh = new Warehouse(conf); > //Use destination table's db location. > String destTableDb = qb.getTableDesc() != null? > qb.getTableDesc().getDatabaseName(): null; > if (destTableDb == null) { > destTableDb = names[0]; > } > location = wh.getDatabasePath(db.getDatabase(destTableDb)); > } catch (MetaException e) { > throw new SemanticException(e); > } > {code} > However, CTAS queries allow specifying a {{LOCATION}} for the new table. Its > possible for this location to be on a different filesystem than the database > location. If this happens temp data will be written to the database > filesystem and will be copied to the table filesystem in {{MoveTask}}. > This extra copying of data can drastically affect performance. Rather than > always use the database location as the staging dir for CTAS queries, Hive > should first check if there is an explicit {{LOCATION}} specified in the CTAS > query. If there is, staging data should be stored under the {{LOCATION}} > directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)