Hi Jason, could you please provide the full log since sending query to and getting result back? The key information is which cuboid is used for the query, cuboid exact match or fuzzy match, how many records be scanned and how long it tooks; Thanks.
2016-08-03 23:19 GMT+08:00 Jason Hale <[email protected]>: > Yes, it would have to do post-aggregation in that case, but the strange > thing is that query was running fast (about 1 second), while queries with > more dimensions, such as "SELECT SUM(clicks) FROM reporting GROUP BY > site_id, child_id, report_date, hotel_id". This query will take about 106 > seconds, but it shouldn't need to do any post-aggregation so I would think > it should return much quicker than that from the respective cuboid. > > Here's the explain plan: > OLAPToEnumerableConverter > OLAPProjectRel(EXPR$0=[$4]) > OLAPAggregateRel(group=[{0, 1, 2, 3}], EXPR$0=[SUM($4)]) > OLAPProjectRel(SITE_ID=[$9], CHILD_ID=[$3], REPORT_DATE=[$0], > HOTEL_ID=[$2], CLICKS=[$10]) > OLAPTableScan(table=[[DEFAULT, HPA_REPORTING2]], fields=[[0, 1, 2, 3, 4, 5, > 6, 7, 8, 9, 10, 11]]) > > On Tue, Aug 2, 2016 at 7:46 PM, ShaoFeng Shi <[email protected]> > wrote: > > > In the cube definition, you defined "SITE_ID", "CHILD_ID" as "Mandatory" > > dimension, which means they will not be aggregated in cube build phase > for > > all combinations. > > > > So when you run a query like "SELECT SUM(clicks) FROM reporting GROUP BY > > search_type", Kylin will use the combination "SITE_ID" + "CHILD_ID" + > > "SEARCH_TYPE" to serve, there will be post-aggregation in runtime; The > > performance is much depent on the cardinality of "SITE_ID" and > "CHILD_ID". > > > > > > 2016-08-02 23:08 GMT+08:00 Jason Hale <[email protected]>: > > > > > I've looked over the optimization options before, but did not notice > the > > > rowkey ordering. I can try this and see if this helps me. This is the > > only > > > thing I see that I can attempt to optimize further in the design, but > > I'll > > > provide my cube design below. I only have one measure to keep it > simple: > > > > > > { > > > "uuid": "4090b854-8f0c-4288-bd73-fc50238a6030", > > > "version": "1.5.2", > > > "name": "hpa_reporting2_cube", > > > "description": "", > > > "dimensions": [ > > > { > > > "name": "DEFAULT.HPA_REPORTING2.REPORT_DATE", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "REPORT_DATE", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.SEARCH_TYPE", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "SEARCH_TYPE", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.HOTEL_ID", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "HOTEL_ID", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.CHILD_ID", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "CHILD_ID", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.COUNTRY", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "COUNTRY", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.DEVICE_TYPE", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "DEVICE_TYPE", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.STAY_LENGTH", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "STAY_LENGTH", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.TRUE_RANK_AG", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "TRUE_RANK_AG", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.ROOM_BUNDLE", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "ROOM_BUNDLE", > > > "derived": null > > > }, > > > { > > > "name": "DEFAULT.HPA_REPORTING2.SITE_ID", > > > "table": "DEFAULT.HPA_REPORTING2", > > > "column": "SITE_ID", > > > "derived": null > > > } > > > ], > > > "measures": [ > > > { > > > "name": "_COUNT_", > > > "function": { > > > "expression": "COUNT", > > > "parameter": { > > > "type": "constant", > > > "value": "1", > > > "next_parameter": null > > > }, > > > "returntype": "bigint" > > > }, > > > "dependent_measure_ref": null > > > }, > > > { > > > "name": "CLICKS", > > > "function": { > > > "expression": "SUM", > > > "parameter": { > > > "type": "column", > > > "value": "CLICKS", > > > "next_parameter": null > > > }, > > > "returntype": "decimal" > > > }, > > > "dependent_measure_ref": null > > > } > > > ], > > > "rowkey": { > > > "rowkey_columns": [ > > > { > > > "column": "REPORT_DATE", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "SEARCH_TYPE", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "HOTEL_ID", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "CHILD_ID", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "COUNTRY", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "DEVICE_TYPE", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "STAY_LENGTH", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "TRUE_RANK_AG", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "ROOM_BUNDLE", > > > "encoding": "dict", > > > "isShardBy": false > > > }, > > > { > > > "column": "SITE_ID", > > > "encoding": "dict", > > > "isShardBy": false > > > } > > > ] > > > }, > > > "signature": "KixlKWxevr6mO+UlSaR5ig==", > > > "last_modified": 1470070273935, > > > "model_name": "hpa_reporting_model2", > > > "null_string": null, > > > "hbase_mapping": { > > > "column_family": [ > > > { > > > "name": "F1", > > > "columns": [ > > > { > > > "qualifier": "M", > > > "measure_refs": [ > > > "_COUNT_", > > > "CLICKS" > > > ] > > > } > > > ] > > > } > > > ] > > > }, > > > "aggregation_groups": [ > > > { > > > "includes": [ > > > "REPORT_DATE", > > > "SEARCH_TYPE", > > > "HOTEL_ID", > > > "CHILD_ID", > > > "COUNTRY", > > > "DEVICE_TYPE", > > > "STAY_LENGTH", > > > "TRUE_RANK_AG", > > > "ROOM_BUNDLE", > > > "SITE_ID" > > > ], > > > "select_rule": { > > > "hierarchy_dims": [], > > > "mandatory_dims": [ > > > "SITE_ID", > > > "CHILD_ID" > > > ], > > > "joint_dims": [ > > > [ > > > "ROOM_BUNDLE", > > > "TRUE_RANK_AG" > > > ] > > > ] > > > } > > > } > > > ], > > > "notify_list": [], > > > "status_need_notify": [ > > > "ERROR", > > > "DISCARDED", > > > "SUCCEED" > > > ], > > > "partition_date_start": 0, > > > "partition_date_end": 3153600000000, > > > "auto_merge_time_ranges": [ > > > 604800000, > > > 2419200000 > > > ], > > > "retention_range": 0, > > > "engine_type": 2, > > > "storage_type": 2, > > > "override_kylin_properties": {} > > > } > > > > > > On Mon, Aug 1, 2016 at 8:02 PM, ShaoFeng Shi <[email protected]> > > > wrote: > > > > > > > Hi Jason, > > > > > > > > As Yiming mentioned, the cube design matters for the performance of > > both > > > > build and query; please check "Optimize Cube" in the document web > page > > > and > > > > do optimizaiton as much as possible; > > > > > > > > Besides, the cluster's capacity and Hadoop configuration is also an > > > > important factor; Try to identify the bottleneck and then optimize or > > add > > > > capacity. > > > > > > > > From 1.5 Kylin ships with two cubing algorithm; the steps "Build > > > > N-Dimension Cuboid" are the legacy "Layered" cubing algorithm; They > > will > > > be > > > > skipped when Kylin selects to use the new "Fast" cubing algorithm, > > which > > > is > > > > the "Build Cube" step after them. Please click the hadoop link in > that > > > step > > > > to inspect the MR job's statistics; > > > > > > > > Hope this helps to some extend; > > > > > > > > > > > > > > > > 2016-08-02 8:44 GMT+08:00 Yiming Liu <[email protected]>: > > > > > > > > > Hi Jason, > > > > > > > > > > Cube design is the performance key for Kylin, not only query, but > > also > > > > cube > > > > > building process. How to select dimensions, how to define the > > > > relationship > > > > > between dimensions, how to select encode method, how to define > > measure, > > > > > even how to choose the Hbase key order will have a significant > impact > > > on > > > > > performance. There are quite a few wonderful documents introducing > > how > > > > to > > > > > do this, http://kylin.apache.org/docs15/ . > > > > > > > > > > One more thing, if you could share your cube design, you would get > > help > > > > > easier here. > > > > > > > > > > 2016-08-02 7:20 GMT+08:00 Jason Hale <[email protected]>: > > > > > > > > > > > I'm setting up a test case for a portion of our dataset, to > > evaluate > > > > > Kylin > > > > > > and I'm not seeing the performance that I would expect. > > > > > > > > > > > > The cube building process is taking about 5-6 hours with > > ~69,000,000 > > > > > > records and 10 dimensions. I'm not sure if that's the expected > > build > > > > > time, > > > > > > but the other problem is the query performance after building the > > > cube. > > > > > > > > > > > > All queries were tested with a very simple query (e.g. SELECT > > > > SUM(clicks) > > > > > > FROM reporting GROUP BY search_type) > > > > > > > > > > > > Grouping by 1 or 2 dimensions gives me very responsive queries > > > (under 2 > > > > > > seconds), but adding more dimensions drastically increases the > > query > > > > time > > > > > > (over 1 minute and it times out through hbase). I would expect > > these > > > > > > queries to have all similar query times since they should query > the > > > > > > respective cuboid, so I'm not sure why the performance would > > suffer. > > > I > > > > > > didn't set up any special rules for the cube, but during the > build > > > > > process > > > > > > it showed all the N-dimension cubes and the log simply said > > > 'skipped'. > > > > > > > > > > > > Is there something I'm missing in the configuration? > > > > > > > > > > > > I have a HDP cluster with 3 nodes and 1 client node on which > Kylin > > is > > > > > > installed. Do I need to adjust the hadoop configuration. I'm > using > > > most > > > > > of > > > > > > the default HDP settings. > > > > > > > > > > > > What more information can I provide? > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > With Warm regards > > > > > > > > > > Yiming Liu (刘一鸣) > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > > > > > Shaofeng Shi > > > > > > > > > > > > > > > -- > > Best regards, > > > > Shaofeng Shi > > > -- Best regards, Shaofeng Shi
