[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

ASF GitHub Bot (JIRA) Mon, 28 Nov 2016 11:43:25 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702905#comment-15702905
 ]


ASF GitHub Bot commented on HIVE-15277:
---------------------------------------

GitHub user b-slim opened a pull request:

    https://github.com/apache/hive/pull/120

    HIVE-15277 Druid stograge handler

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/b-slim/hive rebase_druid_record_writer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #120
    
----
commit 9025d4a33348faa007c17f2c7ff5dee4f3a87318
Author: Slim Bouguerra <slim.bougue...@gmail.com>
Date:   2016-10-26T23:55:34Z

    adding druid record writer
    
    bump guava version to 16.0.1
    
    moving out the injector

commit be2e29dcba5617db478eefa75a5478a77512e090
Author: Jesus Camacho Rodriguez <jcama...@apache.org>
Date:   2016-11-02T03:21:59Z

    Druid time granularity partitioning, serializer and necessary extensions

commit df4036f7f76294dc5599d29cdb760336b0ee9a4f
Author: Jesus Camacho Rodriguez <jcama...@apache.org>
Date:   2016-11-02T19:59:52Z

    Recognition of dimensions and metrics
    
    patch 1

commit ea76f0ddfa33990d92e061676123c45920ed6dce
Author: Slim Bouguerra <slim.bougue...@gmail.com>
Date:   2016-11-02T21:18:00Z

    adding file schema support

commit 010701be7cf939f6854c9ee113ccf40b20aed32a
Author: Jesus Camacho Rodriguez <jcama...@apache.org>
Date:   2016-11-04T19:48:43Z

    native storage
    
    new fixes

commit 3d8496299d1d151da59bb6f547ebbc475c329197
Author: Slim Bouguerra <slim.bougue...@gmail.com>
Date:   2016-11-09T17:57:03Z

    using segment output path

commit 2b10b26eb7a5d9a6058c9e1f206c599e54ec88b2
Author: Slim Bouguerra <slim.bougue...@gmail.com>
Date:   2016-11-16T00:16:10Z

    adding check for existing datasource and implement drop table

commit e18b716a438e8b38155d4ab31b7070ae1945f1e4
Author: Slim Bouguerra <slim.bougue...@gmail.com>
Date:   2016-11-19T00:53:10Z

    adding UTs and refactor some code

commit 3b31d16dcb9fd5cdb9eb6d1c994cb3f0c8cd8a33
Author: Slim Bouguerra <slim.bougue...@gmail.com>
Date:   2016-11-23T23:49:28Z

    fix druid version

commit 4b447e56389aab1f45e9b48192068d1a0257a14c
Author: Slim Bouguerra <slim.bougue...@gmail.com>
Date:   2016-11-28T19:32:02Z

    ignore record writer test

commit a7b4f792a5e28b0772addbc0d5ea52d5b44d9d91
Author: Slim Bouguerra <slim.bougue...@gmail.com>
Date:   2016-11-28T19:38:25Z

    format code

----


> Teach Hive how to create/delete Druid segments 
> -----------------------------------------------
>
>                 Key: HIVE-15277
>                 URL: https://issues.apache.org/jira/browse/HIVE-15277
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>         Attachments: HIVE-15277.2.patch, HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS <select `timecolumn` as `___time`, `dimension1`,`dimension2`,  `metric1`, 
> `metric2`....>;
> {code}
> This statement stores the results of query <input_query> in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

Reply via email to