[ https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HIVE-21934 started by Jesus Camacho Rodriguez. ------------------------------------------------------ > Materialized view on top of Druid not pushing everything > -------------------------------------------------------- > > Key: HIVE-21934 > URL: https://issues.apache.org/jira/browse/HIVE-21934 > Project: Hive > Issue Type: Improvement > Components: Druid integration, Materialized views > Reporter: slim bouguerra > Assignee: Jesus Camacho Rodriguez > Priority: Major > Labels: pull-request-available > Attachments: HIVE-21934.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The title is not very informative, but examples hopefully are. > this is the plan with the view > {code} > explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`, > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`dates_n1`.`__time`) AS `yr___time_ok` > FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0` > JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON > (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`) > JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON > (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`) > JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON > (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`) > JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON > (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`) > GROUP BY MONTH(`dates_n1`.`__time`), > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`dates_n1`.`__time`) > INFO : Starting task [Stage-3:EXPLAIN] in serial mode > INFO : Completed executing > command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); > Time taken: 0.005 seconds > INFO : OK > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | Plan optimized by CBO. | > | | > | Vertex dependency in root stage | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Stage-1 | > | Reducer 2 vectorized, llap | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=300018951 width=38) | > | Output:["_col0","_col1","_col2","_col3"] | > | Group By Operator [GBY_11] (rows=300018951 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, > KEY._col1, KEY._col2 | > | <-Map 1 [SIMPLE_EDGE] vectorized, llap | > | SHUFFLE [RS_10] | > | PartitionCols:_col0, _col1, _col2 | > | Group By Operator [GBY_9] (rows=600037902 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, > _col1, _col2 | > | Select Operator [SEL_8] (rows=600037902 width=38) | > | Output:["_col0","_col1","_col2"] | > | TableScan [TS_0] (rows=600037902 width=38) | > | > mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} > | > | | > +----------------------------------------------------+ > > {code} > if i use a simple druid table without MV > {code} > explain SELECT MONTH(`__time`) AS `mn___time_ok`, > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`__time`) AS `yr___time_ok` > FROM `druid_ssb.ssb_druid_100` > GROUP BY MONTH(`__time`), > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`__time`); > {code} > {code} > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | Plan optimized by CBO. | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Select Operator [SEL_1] | > | Output:["_col0","_col1","_col2","_col3"] | > | TableScan [TS_0] | > | > Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York') > - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')), > 'LONG')\",\"outputType\":\"LONG\"}],\"limitSpec\":\{\"type\":\"default\"},\"aggregations\":[\{\"type\":\"longSum\",\"name\":\"$f3\",\"expression\":\"1\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"} > | > | | > +----------------------------------------------------+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)