Re: Kylin Cube Performance

ShaoFeng Shi Mon, 01 Aug 2016 18:03:11 -0700

Hi Jason,

As Yiming mentioned, the cube design matters for the performance of both
build and query; please check "Optimize Cube" in the document web page and
do optimizaiton as much as possible;


Besides, the cluster's capacity and Hadoop configuration is also an
important factor; Try to identify the bottleneck and then optimize or add
capacity.

>From 1.5 Kylin ships with two cubing algorithm; the steps "Build
N-Dimension Cuboid" are the legacy "Layered" cubing algorithm; They will be
skipped when Kylin selects to use the new "Fast" cubing algorithm, which is
the "Build Cube" step after them. Please click the hadoop link in that step
to inspect the MR job's statistics;

Hope this helps to some extend;



2016-08-02 8:44 GMT+08:00 Yiming Liu <[email protected]>:

> Hi Jason,
>
> Cube design is the performance key for Kylin, not only query, but also cube
> building process. How to select dimensions, how to define the relationship
> between dimensions, how to select encode method, how to define measure,
> even how to choose the Hbase key order will have a significant impact on
> performance.  There are quite a few wonderful documents introducing how to
> do this, http://kylin.apache.org/docs15/ .
>
> One more thing, if you could share your cube design, you would get help
> easier here.
>
> 2016-08-02 7:20 GMT+08:00 Jason Hale <[email protected]>:
>
> > I'm setting up a test case for a portion of our dataset, to evaluate
> Kylin
> > and I'm not seeing the performance that I would expect.
> >
> > The cube building process is taking about 5-6 hours with  ~69,000,000
> > records and 10 dimensions. I'm not sure if that's the expected build
> time,
> > but the other problem is the query performance after building the cube.
> >
> > All queries were tested with a very simple query (e.g. SELECT SUM(clicks)
> > FROM reporting GROUP BY search_type)
> >
> > Grouping by 1 or 2 dimensions gives me very responsive queries (under 2
> > seconds), but adding more dimensions drastically increases the query time
> > (over 1 minute and it times out through hbase). I would expect these
> > queries to have all similar query times since they should query the
> > respective cuboid, so I'm not sure why the performance would suffer. I
> > didn't set up any special rules for the cube, but during the build
> process
> > it showed all the N-dimension cubes and the log simply said 'skipped'.
> >
> > Is there something I'm missing in the configuration?
> >
> > I have a HDP cluster with 3 nodes and 1 client node on which Kylin is
> > installed. Do I need to adjust the hadoop configuration. I'm using most
> of
> > the default HDP settings.
> >
> > What more information can I provide?
> >
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>



-- 
Best regards,

Shaofeng Shi

Re: Kylin Cube Performance

Reply via email to