Re: Kylin Cube Performance

Yiming Liu Mon, 01 Aug 2016 17:45:42 -0700

Hi Jason,

Cube design is the performance key for Kylin, not only query, but also cube
building process. How to select dimensions, how to define the relationship
between dimensions, how to select encode method, how to define measure,
even how to choose the Hbase key order will have a significant impact on
performance.  There are quite a few wonderful documents introducing how to
do this, http://kylin.apache.org/docs15/ .


One more thing, if you could share your cube design, you would get help
easier here.

2016-08-02 7:20 GMT+08:00 Jason Hale <[email protected]>:

> I'm setting up a test case for a portion of our dataset, to evaluate Kylin
> and I'm not seeing the performance that I would expect.
>
> The cube building process is taking about 5-6 hours with  ~69,000,000
> records and 10 dimensions. I'm not sure if that's the expected build time,
> but the other problem is the query performance after building the cube.
>
> All queries were tested with a very simple query (e.g. SELECT SUM(clicks)
> FROM reporting GROUP BY search_type)
>
> Grouping by 1 or 2 dimensions gives me very responsive queries (under 2
> seconds), but adding more dimensions drastically increases the query time
> (over 1 minute and it times out through hbase). I would expect these
> queries to have all similar query times since they should query the
> respective cuboid, so I'm not sure why the performance would suffer. I
> didn't set up any special rules for the cube, but during the build process
> it showed all the N-dimension cubes and the log simply said 'skipped'.
>
> Is there something I'm missing in the configuration?
>
> I have a HDP cluster with 3 nodes and 1 client node on which Kylin is
> installed. Do I need to adjust the hadoop configuration. I'm using most of
> the default HDP settings.
>
> What more information can I provide?
>



-- 
With Warm regards

Yiming Liu (刘一鸣)

Re: Kylin Cube Performance

Reply via email to