Hi Roberto, Glad to hear from you. Actually, I do not have any large cardinality dimensions in my case. The largest cardinality is around 400. I was wondering how much does the accuracy of count distinct matter. I set all dimensions in lookup table to derived dimensions already. What I am curious about is the relation between the number of dimensions and the building speed. Also the relation between the count distinct accuracy and the building speed.
Thanks, Zhuoran 发件人: Roberto Tardío Olmos [mailto:[email protected]] 发送时间: 2017年4月27日 19:08 收件人: [email protected] 主题: Re: A problem in cube building time Hi Zhuoran, I faced a similar problem about cube building time. I think that depends on the cardinality of the 2 dimensions you add. If some of these has a big cardinality (eg. in my use case about 500.000 rows, Customer Dimension) the number of combinations Kylin need to build the cube increases a lot. Some things you could try to reduce cube building time and size: * Define all Dimension tables attributes as a Derived Dimensions. In this cases you can not use Hierarchy optimization in Agg Group. The query latency in queries that use derived attributes will be less optimal than using Agg Group Hierarchies (with Normal Dimensions), but in some cases the differences in query latency are acceptable (in my case between 2 and 6 seconds more, depending of the query). Cube size and building time will be reduced. * Use "Shard By" in Rowkey for High Cardinality Dimensions. I have not been able to test it yet, but as indicated at https://kylin.apache.org/docs16/howto/howto_optimize_build.html should work fine. This helps to reduce cube building time. I hope to help you, I'm also learning to use Kylin. Kind Regards, El 27/04/2017 a las 12:46, 吕卓然 escribió: Hi all, Currently I am using Kylin 1.6.1 and I face a problem about cube building time. I have one fact table and two lookup tables. When I set 13 normal dimensions and 15 derived dimensions and two measures (count and count distinct). The step3 in building takes around 20mins and the entire building takes around 1 hour. This is good. However, when I try to increase to 15 normal dimensions and 15 derived dimensions and two measures(count and count distinct). The step3 in building takes around 240mins and the entire building takes forever…. BTW, I have a hierarchy dimension which has 4 normal dimensions. I am really confusing about this. Does 13 normal dimensions become a bottleneck in building cube? Thanks a lot! Zhuoran -- Roberto Tardío Olmos Senior Big Data & Business Intelligence Consultant Avenida de Brasil, 17, Planta 16. 28020 Madrid Fijo: 91.788.34.10 [cid:[email protected]]
