[ https://issues.apache.org/jira/browse/HIVE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesus Camacho Rodriguez resolved HIVE-14468. -------------------------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Pushed in HIVE-14217. > Implement Druid query based input format > ---------------------------------------- > > Key: HIVE-14468 > URL: https://issues.apache.org/jira/browse/HIVE-14468 > Project: Hive > Issue Type: Sub-task > Components: Druid integration > Affects Versions: 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Fix For: 2.2.0 > > > It is responsible of generating the splits and creating the record readers. > * For *Timeseries*, *TopN*, *GroupBy* queries. Create a single split > containing the broker address and the query. Then the record reader will > submit the query to the broker, retrieve the results, and parse them and > generate records. > * For *Select* queries. Druid has the concept of threshold (limit) in Select > query. In fact, it is used for retrieving the query results in multiple > requests. Hence, we will emit a Druid Segment Metadata query to obtain the > number of rows in the datasource. Then we create _number of rows / > default\_threshold_ splits; _default\_threshold_ is a Hive configuration > property defined as {{hive.druid.select.threshold}}. Each split generated > contains the broker address and a Select JSON query with _start_ and _end_ > date range (currently we assume uniform distribution of records across the > time dimension). The splits are handled independently by the record readers, > which submit the query to the broker, retrieve the results, and parse them > and generate records. This way we can parallelize the retrieval of results > for these queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)