Hi Roberto,

Glad to hear from you. Actually, I do not have any large cardinality dimensions 
in my case. The largest cardinality is around 400. I was wondering how much 
does the accuracy of count distinct matter. I set all dimensions in lookup 
table to derived dimensions already. What I am curious about is the relation 
between the number of dimensions and the building speed. Also the relation 
between the count distinct accuracy and the building speed.

Thanks,
Zhuoran


发件人: Roberto Tardío Olmos [mailto:[email protected]]
发送时间: 2017年4月27日 19:08
收件人: [email protected]
主题: Re: A problem in cube building time


Hi Zhuoran,

I faced a similar problem about cube building time. I think that depends on the 
cardinality of the 2 dimensions you add. If some of these has a big cardinality 
(eg. in my use case about 500.000 rows, Customer Dimension) the number of 
combinations Kylin need to build the cube increases a lot.

Some things you could try to reduce cube building time and size:

  *   Define all Dimension tables attributes as a Derived Dimensions. In this 
cases you can not use Hierarchy optimization in Agg Group. The query latency in 
queries that use derived attributes will be less optimal than using Agg Group 
Hierarchies (with Normal Dimensions), but in some cases the differences in 
query latency are acceptable (in my case between 2 and 6 seconds more, 
depending of the query). Cube size and building time will be reduced.
  *   Use "Shard By" in Rowkey for High Cardinality Dimensions. I have not been 
able to test it yet, but as indicated at 
https://kylin.apache.org/docs16/howto/howto_optimize_build.html should work 
fine. This helps to reduce cube building time.

I hope to help you, I'm also learning to use Kylin.

Kind Regards,
El 27/04/2017 a las 12:46, 吕卓然 escribió:
Hi all,

Currently I am using Kylin 1.6.1 and I face a problem about cube building time. 
I have one fact table and two lookup tables. When I set 13 normal dimensions 
and 15 derived dimensions and two measures (count and count distinct). The 
step3 in building takes around 20mins and the entire building takes around 1 
hour. This is good.
However, when I try to increase to 15 normal dimensions and 15 derived 
dimensions and two measures(count and count distinct). The step3 in building 
takes around 240mins and the entire building takes forever….
BTW, I have a hierarchy dimension which has 4 normal dimensions.
I am really confusing about this.  Does 13 normal dimensions become a 
bottleneck in building cube?

Thanks a lot!
Zhuoran

--
Roberto Tardío Olmos
Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.
28020 Madrid
Fijo: 91.788.34.10
[cid:[email protected]]

Reply via email to