[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554295#comment-15554295 ]
Jesus Camacho Rodriguez commented on HIVE-14474: ------------------------------------------------ [~ashutoshc], it is up-to-date; it is just that the initial commit was 24 days ago, and then I just amended it... :) > Create datasource in Druid from Hive > ------------------------------------ > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration > Affects Versions: 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.03.patch, HIVE-14474.04.patch, HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > In the initial implementation proposed in this issue, we will write the > results of the query to HDFS (or the location specified in the CTAS > statement), and submit a HadoopIndexing task to the Druid overlord. The task > will contain the path where data was stored, it will read it and create the > segments in Druid. Once this is done, the results are removed from Hive. > The syntax will be as follows: > {code:sql} > CREATE TABLE druid_table_1 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource") > AS <input_query>; > {code} > This statement stores the results of query <input_query> in a Druid > datasource named 'my_query_based_datasource'. One of the columns of the query > needs to be the time dimension, which is mandatory in Druid. In particular, > we use the same convention that it is used for Druid: there needs to be a the > column named '\_\_time' in the result of the executed query, which will act > as the time dimension column in Druid. Currently, the time column dimension > needs to be a 'timestamp' type column. > This initial implementation interacts with Druid API as it is currently > exposed to the user. In a follow-up issue, we should propose an > implementation that integrates tighter with Druid. In particular, we would > like to store segments directly in Druid from Hive, thus avoiding the > overhead of writing Hive results to HDFS and then launching a MR job that > basically reads them again to create the segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)