[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548763#comment-15548763 ]
Hive QA commented on HIVE-14474: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12831727/HIVE-14474.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10656 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1403/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1403/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1403/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12831727 - PreCommit-HIVE-Build > Create datasource in Druid from Hive > ------------------------------------ > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration > Affects Versions: 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.03.patch, HIVE-14474.04.patch, HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > In the initial implementation proposed in this issue, we will write the > results of the query to HDFS (or the location specified in the CTAS > statement), and submit a HadoopIndexing task to the Druid overlord. The task > will contain the path where data was stored, it will read it and create the > segments in Druid. Once this is done, the results are removed from Hive. > The syntax will be as follows: > {code:sql} > CREATE TABLE druid_table_1 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource") > AS <input_query>; > {code} > This statement stores the results of query <input_query> in a Druid > datasource named 'my_query_based_datasource'. One of the columns of the query > needs to be the time dimension, which is mandatory in Druid. In particular, > we use the same convention that it is used for Druid: there needs to be a the > column named '\_\_time' in the result of the executed query, which will act > as the time dimension column in Druid. Currently, the time column dimension > needs to be a 'timestamp' type column. > This initial implementation interacts with Druid API as it is currently > exposed to the user. In a follow-up issue, we should propose an > implementation that integrates tighter with Druid. In particular, we would > like to store segments directly in Druid from Hive, thus avoiding the > overhead of writing Hive results to HDFS and then launching a MR job that > basically reads them again to create the segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)